The UPC Intonation Toolkit (mCART)

 

mCART is a complete intonation model training package developed at TALP Research Center of the Universitat Politècnica de Catalunya during the TC-STAR - Technology and Corpora for Speech to Speech Translation project.

This software eases the generation of an intonation model for the prosody module of Text-to-Speech systems. It generates a fundamental frequency contour specific to the input text that is to be synthesized. The generation process uses information provided by upstream components, such as syllablification, stress, phonetic transcription, part-of-speech tagging, syntactic analysis and prosodic boundaries.

Three different mathematical formulations are implemented: Bezier, Fujisaki and Tilt. Each formulation can be trained by means of the two available procedures: SbS and JEMA. Several training modes are available: train and test, n-FOLD cross-validation and full trainig. Some of these modes can be used for research purposes to study the performance of each training method.

 


Installation

Obtain a copy of the software.

Uncompress it ( tar zxf upc_intonation_toolkit.tgz ). It will create the directory upc_intonation_toolkit, we'll call it $INTDIR.

Compile the code:
  • cd $INTDIR/prj
  • make release (or make debug for unoptimized code with debugging symbols)
The mCART program should now be in $VCINT/bin/release (or $VCINT/bin/debug).

Add this directory to the PATH environment variable.

Execute mCART -h for a help message describing the parameters.

Documentation

The documentation can be downloaded from here (it is also distributed with code, inside the docs directory). The documentation is in PDF format, which can be viewed with a number of applications.

It provides information on the available techniques implemented by mCART, a description of the technical background of the different algorithms and a detailed description of the input data files necessary for the training proces.

 

Download

This toolkit is made available under the terms of the GNU Lesser General Public License (LGPL) and it can be downloaded from here.


References

 


Authors

Pablo Daniel Agüero
Javier Pérez
Antonio Bonafonte

This work has been funded by the European Union under the integrated project TC-STAR - Technology and Corpora for Speech to Speech Translation (IST-2002-FP6-506738).