Handwritten / printed text discrimination using agent technology (hapedate)

ERGANEO



21 Septembre 2016

Partager sur facebook Partager sur twitter Partager sur linkedin Partager sur google+

Fields

NICT

Sectors

NICT

Scanned documents can contain handwritten text, such as annotations, signatures, or filled-in blanks in forms. Identifying those handwritten parts from printed parts is a preliminary step in the treatment chain of the document.
To this end, state-of-the-art approaches are generally using different classifying tools and then compare their results at the end of the classification process. Machine learning algorithms can also be used but in that case, huge databases are usually needed.
Our approach is based on the dialogue of two agents, one assessing the linearity of the writing, the other one the regularity. After an optional learning phase (for example to integrate specific characteristics of a given database), both agents interact to dynamically update their decision models and finally make a common decision, leading to better results than state-of-the-art tools. Without any learning phase, our system reaches an overall recognition rate of 92,3% on the IAM database.
Our system has been developed for Latin alphabets but may be adapted to other alphabets, such as Arabic or Asian writings.

 

Competitive advantages :

  • No need of prior knowledge of the text layout (especially in case of printed forms)
  • No need of databases 
  • No need of deep learning algorithms 
  • Customizable decision function depending on context

 

Applications :

  • Printed text detection and extraction for Optical Character Recognition (OCR)
  • Handwritten text extraction from filled up printed forms
  • Automated detection of manually annotated printed documents

 

Keywords : Agent technology, Optical Character Recognition (OCR), Document analysis

Download the offer Download the offer

Newsletter