Automatic speaker recognition

Francisco Javier Hernando

 

The development of technologies able to automatically recognize speakers through their voices has been the subject of growing interest over the past few years due to its numerous applications: access control, financial and commercial operations, the audio indexing of meetings and radio and television programs, and police investigations. This field of research involves identifying or checking the identity of speakers on the one hand, and determining the separation boundaries in a signal between various speakers (speaker segmentation) on the other hand.

Speech signals depend on the physical and emotional state of speakers, such as the size of their vocal cords and tract, their state of health, their mood and their linguistic habits. In addition, the environment in which speech signals are emitted must be taken into account as environmental conditions may distort the signals. The systems that have obtained the best results to date use so-called low level parameters, which are the tone, spectral magnitude and formant frequency. However, it is known that high level features such as dialect, vocabulary, intonation and the duration of utterances can be used to differentiate speakers.

The TALP Center is basically devoted to the following lines of research:

 

  • Speaker identification and verification using high and low level parameters.
  • Combinations of high and low level parameters.
  • System robustness in environments.
  • Speaker segmentation.
  • Multimodal recognition of speakers.
  • Combination of various features: voices, faces, irises, fingerprints, etc.