Scanned documents can contain handwritten text, such as annotations, signatures, or filled-in blanks in forms. Identifying those handwritten parts from printed parts is a preliminary step in the treatment chain of the document.
To this end, state-of-the-art approaches are generally using different classifying tools and then compare their results at the end of the classification process. Machine learning algorithms can also be used but in that case, huge databases are usually needed.
Our approach is based on the dialogue of two agents, one assessing the linearity of the writing, the other one the regularity. After an optional learning phase (for example to integrate specific characteristics of a given database), both agents interact to dynamically update their decision models and finally make a common decision, leading to better results than state-of-the-art tools. Without any learning phase, our system reaches an overall recognition rate of 92,3% on the IAM database.
Our system has been developed for Latin alphabets but may be adapted to other alphabets, such as Arabic or Asian writings.
Competitive advantages :
Applications :
Keywords : Agent technology, Optical Character Recognition (OCR), Document analysis