Machine learning techniques can take advantage from corpora in order to try to learn dependency structures of natural language sentences, that exhibit a significant amount of non-local phenomena.
Machine learning techniques can take advantage from corpora in order to try to learn dependency structures of natural language sentences, that exhibit a significant amount of non-local phenomena.
In the field of human-machine interaction, it is becoming increasingly more important that computers adapt to human needs: they should form an integral part of the way humans communicate without requiring exacting efforts by users. This implies a need for multimodal user interfaces that have robust perceptive capacities and that use non-intrusive sensors. The TALP Center is working on a set of acoustic scene analysis systems that have a number of perceptive and cognitive functionalities. To do so, it is researching speech and audio processing technologies that make it possible to identify speakers, recognize speech, localize and separate acoustic sources, detect and classify noise, etc.
A recently built intelligent room in building D5 at the Center is the testing ground for these applications. It is equipped with audio and video equipment and is designed for lecturers to give presentations and seminars. We are working to make advances in the multimodal approach, specifically in the integration of audio and video platforms, which is being done by taking advantage of a collaboration that is already in place with the Image Processing Group from the Department of Signal Theory and Telecommunications at the UPC. This research is currently being undertaken under the auspices of the European framework project CHIL and the CICyT ACESCA project.
In the recent past, broadband speech coders have been used at a bit rate of between 8 kbps and 32 kbps. Current studies are focusing on the robustness of these coders (and their variants), which have standard settings in various mobile phone applications. This robustness is assessed in two clearly defined ways.
The first way consists of the addition of a noise reduction step (or steps) to overcome possible background noises present in speech. This means trying, in the event of noisy speech reception, to make the output of the speech coder resemble the output for a clear voice (without noise) as closely as possible.
The second way consists in assessing the degradation experienced by certain coders when there are bit error rates in the transmission channel. In mobile telephone systems, a margin of error is allowed for in the bit error rate during the transmission of speech channels, which results in the degradation of decoded speech or reconstructed speech in the receiver. In view of this assumption, the aim is to reduce coders
Knowledge representation involves the modeling of systems that use artificial intelligence to process information. The form it takes basically depends on the task for which the knowledge in question is required. Therefore, the type and quantity of knowledge to perform a task are taken into consideration, as is the approach adopted to code and store it.
Large-scale lexical-semantic ontologies, rule systems and computational lexicons with different content and targets (verb diathesis models, total and partial grammars, selectional restrictions, etc.) are the most commonly used structures.
A very active current line of research is the building and expansion of these ontologies using automatic and semiautomatic media: the syntactic and semantic analysis of large quantities of texts makes it possible to learn and acquire new concepts and to create new relationships between them, which then go on to form part of the knowledge stored in the ontology.
The development of efficient speech dialog systems involves choosing suitable dialog strategies that are able to ask the right questions and return the information requested by users. The problem here lies in the fact that there are no set methods or clear criteria for outlining a good strategy. The criteria applied by the UPC for designing a dialog are based on extremely simple concepts:
Besides these basic principles of design, there are two significant factors that condition the development of dialogs: firstly, the range of application scenarios that must be resolved (i.e. the design of a dialog is determined by a system’s scope of application) and secondly, the performance of the speech recognition systems used.
In view of the above concepts, in most cases the dialog systems developed favor a certain style of control. These systems guarantee improved robustness by minimizing the number of mistakes or omissions made by users, in exchange for less freedom and a loss of naturalness.
Some of the most common strategies that form part of dialog that are designed to increase robustness and naturalness in these kinds of deterministic systems include :
In information extraction (IE), the work carried out focuses on the use of automatic learning techniques to overcome the main drawbacks of the application of IE and its inherent dependence on a domain by reducing the need for supervision. Specifically, work is being carried out to design pattern acquisition methods for IE in restricted and unrestricted domains (whether structured or unstructured texts), document clustering techniques (so the unsupervised learning of IE patterns in open domains may require this preliminary step) and robust methods for the extraction of information in different media (both texts and transcriptions of the spoken word).
Information retrieval (IR), both of texts and multimedia resources, is an important part of the processes of collection indexing and document or passage retrieval (based on previously indexed collections or by means of Internet wrappers).
The TALP Center is actively involved in question answering (Q&A) tasks. As a result of its work, the group has a multilingual question answering system, which it has entered in the 2003 and 2004 TREC competitions – in the open domain category for English – and in the 2003 and 2004 CLEF competition – also in the open domain category but for Spanish. It has also designed a geography demonstrator in Spanish for the ALIADO project for a restricted domain environment and took part in the first GEOCLEF competition in 2005 for the same domain in English. In addition, it is now trying to extend the capacity of its current Q&A system so that it is able to handle oral questions about facts, lists, definitions, information and biographies, and it is also endeavoring to extend the system’s multilingual capacities to Catalan.
The automatic production of summaries is also tackled at various levels: monolingual, multilingual and cross-lingual summaries; mono- and multi-document summaries; text and speech summaries; extract and abstract summaries; general summaries; and guided summaries based on the questions, profiles or interests of users.
Document analysis involves recognizing and extracting written text and pre-processing it (lexical and sentence segmentation, morpho-syntactic analysis and disambiguation, the detection and classification of noun phrases, superficial and deep syntactic analysis, semantic analysis, resolution of cross-references, etc.).
The tasks that make up this line of research are:
Saga
Phonetic transcription of spanish dialect varieties
FestCat: Catalan extension of the Festival TTS
This package provides software and data to extend the Festival TTS to the catalan language
Intonation Toolkit
Intonation Toolkit for Text-To-Speech Systems
Voice Conversion Toolkit
Toolkit for Linear Prediction (LPC) and Harmonic Stochastic (HSM) Voice Conversion
MARIE
An Ngram-based Statistical Machine Translation Decoder
AlignmentSet
Library and command-line utilities to manage sets of sentence pairs aligned at a word or phrase level.
IQMT
Open Source Framework for MT Evaluation.
FreeLing
Open source suite of Language Analyzers
Omlet & Fries
Open source libraries providing Machine Learning and Feature Extraction facilities.
SVMTool
Open source generator of sequential taggers based on Support Vector Machines.
Saga
Transcriptor fonético de las variedades dialectales del español
FestCat: Catalan extension of the Festival TTS
This package provides soft. and data to extend the Festival TTS to the catalan language
Intonation Toolkit
Intonation Toolkit for Text-To-Speech Systems
Voice Conversion Toolkit
Toolkit for Linear Prediction (LPC) and Harmonic Stochastic (HSM) Voice Conversion
MARIE
An Ngram-based Statistical Machine Translation Decoder
AlignmentSet
Library and command-line utilities to manage sets of sentence pairs aligned at a word or phrase level.
IQMT
Open Source Framework for MT Evaluation.
FreeLing
Open source suite of Language Analyzers.
Omlet & Fries
Open source libraries providing Machine Learning and Feature Extraction facilities.
Sibyl - QA engine for spoken documents.
AEDL - Acoustic event detection and localization in multimodal room
ALICE - Integration of several state-of-the-art technologies related to spoken language and natural language processing used in Intelligent Computer Assisted Language Learning (ICALL) systems
Asiya - An Open Toolkit for Automatic Machine Translation (Meta-)Evaluation
AVIVAVOZ - Voice translation technologies: recognition, statistical machine translation based on corpus and synthesis.
Bidirectional machine translator between Spanish and Catalan: piece of program Àgora (March 2009) in Flash format
DIGUI - Flexible DIalogues Guiding the User Interaction for accesing web services
Copyright © 2017 - Designed by Madstudio