ARABIC LEXICON

IDF INNOV



27 Novembre 2015

Partager sur facebook Partager sur twitter Partager sur linkedin Partager sur google+

Fields

NICT Language, Writings, Art & Culture

Sectors

NICT
Education
Business, Finance & Management

In most languages, common nouns, adjectives and verbs can take very various forms in sentences, depending on the grammatical rules of the language. This is especially true in the Arabic language, where a single root of three consonants can generate hundreds of different forms. While traditional dictionaries cover only a small fraction of the whole range of forms found in texts, our technology has been used to generated a database of 65 000 entries with their 6 millions of forms, covering more than 98 % of the forms found in any sort of text (literature, newspaper articles etc.), the remaining 2% including proper names. Arabic Lexicon interfaces with Unitex, which is an open source corpus processing system for language processing, developed by Gaspard Monge Laboratory (LIGM UPEM). Unitex Arabic has been presented to prestigious organizations, like Al-Ghazali Institute of La Grande Mosquée de Paris and L’Institut du Monde Arabe. It now can be used in a wide range of domains, like text editors, digitalization of printed documents, data mining in Arabic web contents and e-learning of Arabic.

Competitive advantages :

  • Accuracy
  • Exhaustivity
  • Responsiveness

Applications :

  • Orthographic correction
  • Automatic typing word completion
  • E-reputation analysis on web sites
  • E-learning of the Arabic language
  • Digitalization of documents

Keywords : Semitic languages, Arabic, Orthography, Grammar, Unitex

Download the offer Download the offer

Newsletter