Joint Localization and Classification of Multiple Sound Sources Using a Multi-task Neural Network

Weipeng He, Petr Motlicek, Jean-Marc Odobez

Abstract

We propose a novel multi-task neural network-based approach for joint sound source localization and speech/non-speech classification in noisy environments. The network takes raw short time Fourier transform as input and outputs the likelihood values for the two tasks, which are used for the simultaneous detection, localization and classification of an unknown number of overlapping sound sources, Tested with real recorded data, our method achieves significantly better performance in terms of speech/non-speech classification and localization of speech sources, compared to method that performs localization and classification separately. In addition, we demonstrate that incorporating the temporal context can further improve the performance.

Type

Conference paper

Publication

In INTERSPEECH, 2018.

Date

June, 2018

Links

PDF Project Poster Slides Video DOI

This paper is nominated for a best student paper.