YELLA Sree Harsha awarded the EPFL PhD degree for his work on Speaker diarization
Speaker diarization is the task of identifying “who spoke when” in an audio stream containing multiple speakers. This is an unsupervised task as there is no a priori information about the speakers.
This thesis proposes methods to handle issues arising due to the spontaneous nature of the conversations such as overlapping speech and the far-field nature of the audio which results in low signal to noise ratio.
Towards this, the thesis proposes new approaches to various stages in speaker diarization such as feature extraction, speech activity detection, clustering criterion and system output combination.
At the feature level, hidden layer activations of artificial neural networks trained on related tasks to speaker diarization such as speaker classification and speaker comparison are shown to provide complementary information to the traditional short-term spectral features used for diarization.
In speech activity detection, the thesis proposes a new method for overlapping speech detection combining acoustic information and the information from the structure of the conversation. To handle errors made in speech activity detection and effects of background noise, the thesis proposes clustering using Information bottleneck with side information framework.
Finally, the thesis also proposes an combination method for different diarization system. Here, the output of one diarization system is used to generate input features for another system.
All the methods proposed here have been evaluated on standard corpora containing meeting room conversations and have shown significant improvements over the baseline state-of-the-art speaker diarization systems.
Key Words: Speaker diarization, meeting room conversations, conversational speech, overlapping speech, side information, phoneme background model, artificial neural networks, bottleneck features, system combination.
Congratulations to Sree Harsha for his excellent work.
To download Sree Harsha's thesis, click on the following link: Speaker diarization of spontaneous meeting room conversations