Information retrieval, text analysis and automatic synthesis

Horacio Rodríguez

 

This line of research is particularly important as the three tasks involved are the subject of active research by the main NLP research groups in the world. More importantly still, it forms the basis of other lines of research given that most of them require prior retrieval and text processing.

Information retrieval (IR), both of texts and multimedia resources, is an important part of the processes of collection indexing and document or passage retrieval (based on previously indexed collections or by means of Internet wrappers).

Document analysis involves recognizing and extracting written text and pre-processing it (lexical and sentence segmentation, morpho-syntactic analysis and disambiguation, the detection and classification of noun phrases, superficial and deep syntactic analysis, semantic analysis, resolution of cross-references, etc.).

The tasks that make up this line of research are:

 

  • Classification of documents and passagesClustering of documents
  • Clustering of documents
  • Detection of subject matter in documents and collections
  • Detection of links to and in documents
  • Measurement of distances (semantic or distributional) between language units, etc.

The automatic production of summaries is also tackled at various levels: monolingual, multilingual and cross-lingual summaries; mono- and multi-document summaries; text and speech summaries; extract and abstract summaries; general summaries; and guided summaries based on the questions, profiles or interests of users.

Several approaches are taken in the work on summaries, including lexical chains, automatic learning, and the measurement of relevance and redundancy.