The UPC Voice Conversion Toolkit
This toolkit provides two different methods for performing Voice Conversion. It has been developed at TALP Research Center of the Universitat Politècnica de Catalunya during the TC-STAR - Technology and Corpora for Speech to Speech Translation project. The first method is a C/C++ tool based on the Linear Prediction model (LPC), whereas the second method is a Matlab tool based on the Harmonic/Stochastic model (HSM).
Linear Prediction based Voice Conversion (LPC)
In this method, CARTs are used to split the acoustic space into several classes based on phonetic features. For each class, a linear regression is applied to transform the LSF coefficients using GMMs. Then, the appropiated residual is selected from the residuals found in the training data based on the similarity of the associated LSF and the transformed LSF. Code is provided to perform all the aforementioned steps. Sample scripts are provided to help in the automatization of the whole process.
Installation
Obtain a copy of the software.
Uncompress it ( tar zxf upc_vc_lpc_toolkit.tgz ). It will create the directory upc_vc_lpc_toolkit, we'll call it $VCDIR.
Compile the code:
- cd $VCDIR/prj
- make release (or make debug for unoptimized code with debugging symbols)The programs should now be in $VCDIR/bin/release (or $VCDIR/bin/debug).
The programs should now be in $VCDIR/bin/release (or $VCDIR/bin/debug).Add this directory to the PATH environment variable.
In $VCDIR/data/scripts there are a number of scripts that fully automatize the training and testing of the LPC-based VC system. Read the documentation to learn how to customize and execute the scripts.
Documentation
The documentation is included with the distributed code in form of a text file. It provides a detailed description of the different scripts included in the distribution to train a whole VC system. It also provides a detailed description of the input data files necessary for the training proces.
Download
This toolkit is made available under the terms of the GNU Lesser General Public License (LGPL) and it can be downloaded from here.
References
- Voice Conversion applied to Text-to-Speech Systems
Helenca Duxans, PhD Thesis.
Supervisor: Dr. Antonio Bonafonte.
Barcelona, Spain, July 2006. - Voice Conversion of Non-Aligned Data using Unit Selection
H. Duxans, D. Erro, J. Pérez, F. Diego, A. Bonafonte, A. Moreno
TC-Star Workshop on Speech to Speech Translation . Barcelona, Spain . June 2006 - Including dynamic and phonetic information in voice conversion systems
H. Duxans, A. Bonafonte, A. Kain, J. van Santen
International Conference on Spoken Language Processing , ICSLP 2004 . Jeju Island, Korea . October 2004 - Estimation of GMM in Voice Conversion Including Unaligned Data
Helenca Duxans, Antonio Bonafonte
8th European Conference on Speech Communication and Technology , EUROSPEECH 2003 . Geneva, Switzerland . September 2003
Authors
Helenca Duxans
Javier Pérez
Antonio Bonafonte
Javier Pérez
Antonio Bonafonte
Harmonic/Stochastic based Voice Conversion (HSM)
The second method is based on the harmonic/stochastic model (HSM). This model is used to analyze, modify and synthesize the speech signals. The voice conversion method is based on gaussian mixture models (GMM), which can be trained from parallel and non-parallel corpora. The non-parallel training procedure is suitable for cross-lingual applications because it handles only acoustic parameters. The harmonic component of the signals is converted using the trained transformation function, and the stochastic component is predicted from the converted harmonic component. The unvoiced frames are not modified. The pitch is also adapted to the target speaker by means of a linear transformation concerning the means and variances of the log-f0.
Installation
- mkdir upc_vc_hsm_toolkit
- cd upc_vc_hsm_toolkit
- tar zxf /path/to/upc_vc_hsm_toolkit.tgz.
It is recommended to add upc_vc_hsm_toolkit to the MATLAB path.
This will result in a number of MATLAB programs that can be used to perform the Voice Conversion task. Check the Documentation (also included with the code) for instructions on how to use the Toolkit.
Documentation
The documentation is included with the distributed code in form of a text file. It provides detailed instructions on how to use the different programs to train the whole VC system. It also provides a detailed description of the input data files necessary for the training proces.
Download
This toolkit is made available under the terms of the GNU Lesser General Public License (LGPL) and it can be downloaded from here.
References
- Weighted Frequency Warping for Voice Conversion
D. Erro, A. Moreno
InterSpeech 2007 - EuroSpeech . Antwerp, Belgium . August 2007 - Frame Alignment Method for Cross-lingual Voice Conversion
D. Erro, A. Moreno
InterSpeech 2007 - EuroSpeech . Antwerp, Belgium . August 2007 - Voice Conversion of Non-Aligned Data using Unit Selection
H. Duxans, D. Erro, J. Pérez, F. Diego, A. Bonafonte, A. Moreno
TC-Star Workshop on Speech to Speech Translation . Barcelona, Spain . June 2006 - Sistema de Síntesis Armónico/Estocástico en modo Pitch-Asíncrono aplicado a Conversión de Voz
D. Erro, A. Moreno
IV Jornadas en Tecnologías del Habla. Zaragoza, Spain. November 2006.
Authors
Daniel Erro
Asunción Moreno
This work has been funded by the European Union under the integrated project TC-STAR - Technology and Corpora for Speech to Speech Translation (IST-2002-FP6-506738).