The UPC Intonation Toolkit (mCART)
mCART is a complete intonation model training package developed at TALP Research Center of the Universitat Politècnica de Catalunya during the TC-STAR - Technology and Corpora for Speech to Speech Translation project.
This software eases the generation of an intonation model for the prosody module of Text-to-Speech systems. It generates a fundamental frequency contour specific to the input text that is to be synthesized. The generation process uses information provided by upstream components, such as syllablification, stress, phonetic transcription, part-of-speech tagging, syntactic analysis and prosodic boundaries.
Three different mathematical formulations are implemented: Bezier, Fujisaki and Tilt. Each formulation can be trained by means of the two available procedures: SbS and JEMA. Several training modes are available: train and test, n-FOLD cross-validation and full trainig. Some of these modes can be used for research purposes to study the performance of each training method.
Installation
Obtain a copy of the software.
Uncompress it ( tar zxf upc_intonation_toolkit.tgz ). It will create the directory upc_intonation_toolkit, we'll call it $INTDIR.
Compile the code:The mCART program should now be in $VCINT/bin/release (or $VCINT/bin/debug).
- cd $INTDIR/prj
- make release (or make debug for unoptimized code with debugging symbols)
Add this directory to the PATH environment variable.
Execute mCART -h for a help message describing the parameters.
Documentation
It provides information on the available techniques implemented by mCART, a description of the technical background of the different algorithms and a detailed description of the input data files necessary for the training proces.
Download
References
- Prosody Generation for Speech-to-Speech Translation
Pablo Daniel Agüero, Jordi Adell, Antonio Bonafonte
Proceedings of IEEE International Conference on Acoustics Speech and Signal Processing , ICASSP 2006. Toulouse, Francia. May 2006 - Facing data scarcity using variable feature vector dimension
Pablo Daniel Agüero, Antonio Bonafonte
Speech Prosody, Speech Prosody 2006. Dresden, Germany. May 2006 - Consistent Estimation of Fujisaki's Intonation Model Parameters
Pablo Daniel Agüero, Antonio Bonafonte
SPECOM 2005. Patras, Grecia. October 2005
Authors
Pablo Daniel AgüeroJavier Pérez
Antonio Bonafonte
This work has been funded by the European Union under the integrated project TC-STAR - Technology and Corpora for Speech to Speech Translation (IST-2002-FP6-506738).