Information extraction and question-answer systems

Alicia Ageno

 

The TALP Center is actively involved in question answering (Q&A) tasks. As a result of its work, the group has a multilingual question answering system, which it has entered in the 2003 and 2004 TREC competitions – in the open domain category for English – and in the 2003 and 2004 CLEF competition – also in the open domain category but for Spanish. It has also designed a geography demonstrator in Spanish for the ALIADO project for a restricted domain environment and took part in the first GEOCLEF competition in 2005 for the same domain in English. In addition, it is now trying to extend the capacity of its current Q&A system so that it is able to handle oral questions about facts, lists, definitions, information and biographies, and it is also endeavoring to extend the system’s multilingual capacities to Catalan.

In information extraction (IE), the work carried out focuses on the use of automatic learning techniques to overcome the main drawbacks of the application of IE and its inherent dependence on a domain by reducing the need for supervision. Specifically, work is being carried out to design pattern acquisition methods for IE in restricted and unrestricted domains (whether structured or unstructured texts), document clustering techniques (so the unsupervised learning of IE patterns in open domains may require this preliminary step) and robust methods for the extraction of information in different media (both texts and transcriptions of the spoken word).