FRENCH TREEBANK

ERGANEO



31 Janvier 2020

Partager sur facebook Partager sur twitter Partager sur linkedin Partager sur google+

Fields

NICT Mind, Language And Education Language, Writings, Art & Culture

Sectors

NICT

Analyzing and reproducing natural language requires an understanding of the meaning of the sentence. To meet this need, the corpus made up of more than 20,000 richly annotated sentences in French constitutes a lexical and syntactic resource of reference for linguists and computer scientists, in particular in the case of use in automatic natural language processing.

Competitive advantages

- Quality of the corpus: annotation by automatic tools and corrected by hand by several successive passages on the different annotations
- Available in four formats: xml (original format), Tiger-xml (the most complete format with compound components), PTB (constituent annotations), CoNLL (dependencies annotations)
- Rich annotation : domain, author, date; compound words (and components), 218 morpho-syntactic labels, grammatical functions and trees of syntactic constituents

Applications

- Automatic natural language processing
- Semantic web, search engine
- Human-machine dialogue (chatbots)
- Spellchecking
- Automatic translation
- Language teaching

Keywords : Lexical ressources, Syntatic ressources, NLP

Download the offer Download the offer

Newsletter