Auditory-motivated signal processing and applications to robust speech enhancement and recognition

Most speech recognition systems show good performance under controlled laboratory environment. However, their performance in real-world noises is limited due to acoustic degradations, large variability, and mismatch between training and test phases. Humans, on the other hand, are known to possess robust recognition abilities even in high noise surroundings. The source of robustness is largely due to adaptation to background noises, and selective attention. Many nonlinear, signal-processing mechanisms that take place within the human contribute to robustness. The main goal of this project is to develop suitable signal processing models to emulate the known auditory mechanisms and use that as the front-end to derive robust patterns/features to serve as the input to the back-end speech recognizer. The Idiap Research Institute (http://www.idiap.ch) is an independent, not-for-profit, research institute located in Martigny (Switzerland), and affiliated with the Swiss Federal Institute of Technology at Lausanne (EPFL), and the University of Geneva. IDIAP is involved in numerous national and international (European Union and United States of America) projects, as well as in multiple collaborative projects with the industry (including a few Idiap spin-off companies). At the national level, IDIAP is also the "Leading House" of the National Center of Competence in Research (NCCR) on "Interactive Multimodal Information Management" (IM2, http://www.im2.ch). Idiap is a pioneer in the area of speech technology, in particular automatic speech recognition, speaker recognition, speaker diarization, microphone array processing, and speech synthesis. The Idiap speech group has made several important publications and is a sought-after destination for carrying out state- of-the-art speech research. Their expertise and innovation mainly lies in statistical modelling and design of recognition engines, which form the core of speech recognition systems. Our expertise is in the complementary component – the front-end, which primarily involves signal processing. Specifically, we are interested in building signal-processing models derived out of experimental evidence on auditory mechanisms. The underlying goal is to combine our complementary strengths and to develop a system that has superior performance compared with the state of the art. On the signal-processing front, this joint effort will result in new auditory wavelet bases, which possess optimal time-frequency properties.
Application Area - Human Machine Interaction, Perceptive and Cognitive Systems
Indian Institute of Science
Idiap Research Institute
SNSF
Jan 16, 2012
Jul 15, 2012