Linguistic feature identification tool

This Work Package deals with an array of NLP tasks: Language Identification, Use of Ontologies, and Entity Recognition. These components are all together included under the general term of information extraction: and they are interactive: the language identification triggers the NLP engine for entity extraction, links the lexemes in a text to ontological instances, and is a feature in categorization. The entity extraction relies on ontological instances linked to the lexemes in a text in order to arrive at higher accuracy and focused identification of entities. The features of the NLP together are part of the features of the categorization engine (see report on classification models) for building the statistical vectors. These tasks are critical for the project as they are the first building blocks in the process of identification of radical and terrorist messages in the Internet.
For this work package, an extensive terrorism and radicalism ontology has been enhanced, building on the core IP of INT. The language identification models remain those used by the INT technology. Entity recognition has been re-trained during the project in order to improve accuracy.

D2 3 Linguistic feature identification tool