README ------ The tool consists of the following Matlab m-files: hsmanalyze.m hsmanalyzeall.m hsmsynthesize.m hsmsynthesize.m hsmcorpusp.m hsmcorpusx.m hsmvctrain.m hsmvctrainnp.m hsmvcgmm.m hsmvcwfw.m gmm2wfw.m Each of these m-files provides a short description of the function that is displayed by the "help" command of Matlab. Example: >> help hsmanalyze The functions are configured to operate with "wav" files sampled at 16KHz. HOW TO CONVERT VOICES --------------------- 1. Analysis ----------- First of all, the audio files have to be analyzed according to the harmonic+ stochastic model (HSM). The following command can be used: >> hsmanalyzeall The function allows to choose the directory in which the 16KHz "wav" files are placed. Each of the "wav" files is automatically analyzed and an associated "mat" file containing its HSM parameters is created (the same filename is given to it). IMPORTANT: in this version, the pitch contour of each "wav" file is extracted from the associated "pit" file, which contains the position of the pitch marks. The "pit" files have to be provided by the user and placed in the same location than the wav files. The "pit" file that corresponds to "file00.wav" has to be called "file00.pit" and each of its lines represents the time instant (in seconds) of one pitch mark, in string format. Example: 0.950063 0.95675 0.964313 0.972063 0.979437 0.988125 0.996437 1.00525 1.01388 1.02262 ... The function hsmsynthesize[all] can be used to transform "mat" files into "wav" files. 2. Training of the conversion function -------------------------------------- Once the audio files have been analyzed, the training of the voice conversion function can start. If the audio files of the source and target speaker are parallel recordings, a parallel corpus is built using the command hsmcorpusp. Example of usage: >> hsmcorpusp('spkA\file',1:25,'spkB\file',1:25,'corpora\corpuspAB',2); The parallel corpus "corpora\corpuspAB.mat" is created from the files spkA\file01.mat... spkA\file25.mat and their paired files spkB\file01.mat... spkB\file25.mat. The last parameter of the function indicates that the indexes 1 to 25 are translated into 2-length strings and appended to the strings to build the complete file names. The correspondence between the frames of the parallel files is established using their segmentations. Thus, a set of "mar" files is required for the training process. The "mar" files are text files that contain at least the following information for each phoneme: PHS: 0.922000,1.014000,1.082000,i PHS: 1.082000,1.094000,1.098000,D PHS: 1.098000,1.170000,1.230000,i PHS: 1.230000,1.258000,1.306000,T PHS: 1.306000,1.346000,1.390000,e ... Each line is composed by the label "PHS: ", the time when the phoneme begins (in seconds), the time of maximum stability inside the phoneme (not used in this version), the time when the phoneme ends and the phoneme label. The voice conversion function is automatically trained using hsmvctrain. In the following example, a new transformation function called vcAB is created from the parallel corpus previously built: >> hsmvctrain('corpora\corpuspAB','VCfunctions\vcAB'); If only non-parallel recordings are available, the voice conversion function is obtained by means of hsmcorpusx and hsmvctrainnp. Equivalent example: >> hsmcorpusx('spkA\file',1:25,'corpora\corpusxA',2); >> hsmcorpusx('spkB\file',1:25,'corpora\corpusxB',2); >> hsmvctrainnp('corpora\corpusxA','corpora\corpusxB','VCfunctions\vcnpAB'); In this case, the "mar" files are not required. Note that in both methods the transformation function from speaker B to A is also calculated. If the desired voice conversion method is weighted frequency warping (WFW), a correspondence has to be calculated between the poles of several envelope pairs. In this version, this can be done manually using the function gmm2wfw. Use the help command of Matlab for more information. >> help gmm2wfw >> gmm2wfw('VCfunctions\vcAB','VCfunctions\vcABwfw'); 3. Converting files ------------------- Two methods: GMM-based transformations and WFW. >> hsmvcgmm >> hsmvcwfw This function allows the user to choose the trained VC function, the file of the source speaker to be converted and then the name of the converted file. The converted files are given in "mat" and "wav" format. If the chosen method is WTW, the transformation function is supposed to have been obtained using gmm2wfw.