Srikanth Madikeri

Google Scholar Github

About me

I got my Ph.D. in Computer Science and Engineering from Indian Institute of Technology Madras in 2013. During my Ph.D. I worked on automatic speaker recognition and spoken keyword spotting. I am currently working as a Research Associate at Idiap in the Speech Processing group. My research interests include - Automatic Speech Recognition for low resource languages with focus on information extraction, Automatic Speaker Recognition, Language Recognition, Speaker Diarization, and more recently Spoken Dialog systems.

Contact

E-mail: firstname dot lastname at idiap dot ch

Education

Ph.D. in Computer Science and Engineering at IIT-Madras (2008-2013)
Bachelor of Engineering in Computer Science and Engineering, Anna University, Chennai (2004-2008)

Experience

Research Associate at Idiap Reserach Institute (2018-present)
Postdoctoral researcher at Idiap Reserach Institute (2013-2018)
3 years as Research Associate at IIT Madras (2010-2013)
2 years as Project Associate at IIT Madras (2008-2010)

Publications

Full list of publications

Journals and Book Chapters

Driss Khalil, Amrutha Prasad, Petr Motlicek, Juan Pablo Zuluaga, Iuliia Nigmatulina, Srikanth Madikeri, Christof Schuepbach, An Automatic Speaker Clustering Pipeline for Air Traffic Communication Domain, to appear in Special Issue on Automatic Speech Recognition and Understanding in Air Traffic Management.
Juan Zuluaga-Gomez, Iuliia Nigmatulina, Amrutha Prasad, Petr Motlicek, Driss Khalil, Srikanth Madikeri, Allan Tart, Igor Szoke, Vincent Lenders, Mickael Rigault, Khalid Choukri, Lessons Learned in Transcribing 5000 h of Air Traffic Control Communications for Robust Automatic Speech Understanding. Special Issue on Automatic Speech Recognition and Understanding in Air Traffic Management (2023), Aerospace 10.10 pp. 898.
N. Dawalatabad, S. Madikeri, C. C. Sekhar, and H. A. Murthy, "Novel architectures for unsupervised information bottleneck based speaker diarization of meetings", in IEEE Trans. on Audio, Speech, and Language Processing 2021.
I. Himawan, S. Madikeri, P. Motlicek, M. Cernak, S. Sridharan, and C. Fookes, "Voice Presentation Attack Detection Using Convolutional Neural Networks", Handbook of Biometric Anti-Spoofing, pp. 391-415. ( Code)
S. Dey, P. Motlicek, S. Madikeri, M. Ferras, "Template-matching for text-dependent speaker verification", Speech Communication, Vol 88, pp. 96-105.
M. Ferras, S. Madikeri, H. Bourlard, "Speaker Diarization and Linking of Meeting Data", IEEE ACM. Trans. Audio Speech Lang. Processing. 24(11) pp. 1935-1945.
M. Ferras, S. Madikeri, P. Motlicek, S. Dey and H. Bourlard, "A large-scale open-source acoustic simulator for speaker recognition", IEEE Signal Processing Letters, Vol. 23 (4), pp. 527-531. ( Code)
S. Madikeri, "A fast and scalable hybrid FA/PPCA-based framework for speaker recognition", in Digital Signal Processing, Vol. 32, pp. 137-145, September 2014. (Code hosted at IIT-M)
S. Madikeri, A. Talambedu, and H. A. Murthy, "Modified group delay feature based total variability space modelling for speaker recognition", Internation Journal of Speech Technology, Vol. 18(1), pp. 17-23.

Conferences (selected)

Geoffroy Vanderreydt, Amrutha Prasad, Driss Khalil, Srikanth Madikeri, Kris Demuynck and Petr Motlicek, Parameter-Efficient Training with Adaptive Bottlenecks for Automatic Speech Recognition, to appear in the proceedings of Automatic Speech Recognition and Understanding, 2023.
Iuliia Nigmatulina, Srikanth Madikeri, Esaú Villatoro-Tello, Petr Motli¿ek, Juan Zuluaga-Gomez, Karthik Pandia, Aravind Ganapathiraju, Implementing contextual biasing in GPU decoder for online ASR, in Proc. of Interspeech 2023, pp. 4494--4498.
Sergio Burdisso, Esaú Villatoro-Tello, Srikanth Madikeri, Petr Motlicek. Node-weighted Graph Convolutional Network for Depression Detection in Transcribed Clinical Interviews. In Proc. of Interspeech 2023: pp. 3617-3621.
E. Villatoro, S. Madikeri, P. Motlicek, A. Ganapathiraju, A. Ivanov, "Expanded Lattice Embeddings for Spoken Document Retrieval on Informal Meetings", in Proc. of SIGIR 2022, pp. 2669-2674.
S. Madikeri, P. Motlicek, H. Bourlard, "Multitask adaptation with Lattice-Free MMI for multi-genre speech recognition of low resource languages", in Proc. of Interspeech 2021, pp 4329-4333.
A. Vyas, S. Madikeri, H. Bourlard, "Comparing CTC and LFMMI for out-of-domain adaptation of wav2vec 2.0 acoustic model", in Proc. of Interspeech 2021, pp. 2861-2865.
S. Sarfjoo, S. Madikeri, P. Motlicek, "Speech Activity Detection Based on Multilingual Speech Recognition System", in Proc. of Interspeech 2021
R. Braun, S. Madikeri, P. Motlicek, "A Comparison of Methods for OOV-Word Recognition on a New Public Dataset", in Proc. of ICASSP 2021 ()
A. Vyas, S. Madikeri, H. Bourlard, "Lattice-free mmi adaptation of self-supervised pretrained acoustic models", in Proc. of ICASSP 2021 ()
S. Madikeri, B. Khonglah, S. Tong, Petr Motlicek, H. Bourlard and D. Povey, "Lattice-Free Maximum Mutual Information Training of Multilingual Speech Recognition System", in Proc. Of Interspeech 2020. (Kaldi recipe)
S. Sarfjoo, S. Madikeri, P. Motlicek, S. Marcel, "Supervised domain adaptation for text-independent speaker verification using limited data", in Proc. Of Interspeech 2020
B. Khonglah, et al., "Incremental Semi-supervised Learning for Multi-Genre Speech Recognition", in Proc. Of IEEE ICASSP 2020.
E. Boschee, et al., "SARAL: A Low-Resource Cross-Lingual Domain-Focused Information Retrieval System for Effective Rapid Document Triage", in Proc. of the 57th Conference of the Association for Computational Linguistics: System Demonstrations, pp. 19-24. (link)
S. Madikeri, S. Dey, P. Motlicek, "A Bayesian Approach to Inter-task fusion for speaker recognition", in Proc. of ICASSP 2019, pp. 5786-5790.
S. Dey, S. Madikeri, and P. Motlicek, "End-to-end text-dependent speaker verification using novel distance measures", in Proc. of Interspeech 2018, pp. 3598-3602.
S. Madikeri, S. Dey, and P. Motlicek, "Analysis of Language Dependent Front-End for Speaker Recognition", in Proc. of Interspeech 2018, pp. 1101-1105.
S. Dey, P. Motlicek, S. Madikeri, and M. Ferras, "Exploiting sequence information for text-dependent speaker verification", in Proc. of ICASSP 2017, pp. 5370-5374.
S. Dey, S. Madikeri, and P. Motlicek, "Information theoretic clustering for unsupervised domain-adaptation", in Proc. of ICASSP 2016, pp. 5580-5584.
M. Ferras, S. Madikeri, P. Motlicek, and H. Bourlard, "System fusion and speaker linking for longitudinal diarization of tv shows", in Proc. of ICASSP 2016, pp. 5495-5499.
S. Dey, S. Madikeri, M. Ferras, and P. Motlicek, "Deep neural network based posteriors for text-dependent speaker verification", in Proc. of ICASSP 2016, pp. 5050-5054.
N. Dawalatabad, S. Madikeri, C. C. Sekhar, and H. A. Murthy, "Two-Pass IB Based Speaker Diarization System Using Meeting-Specific ANN Based Features", in Proc. of Interspeech 2016, pp. 2199-2203.
M. Ferras, S. Madikeri, S. Dey, P. Motlicek, and H. Bourlard, "Inter-Task System Fusion for Speaker Recognition", in Proc. of Interspeech 2016, pp. 1810-1814.
S. Madikeri, and H. Bourlard, "KL-HMM based speaker diarization system for meetings", in Proc. of ICASSP 2015, Brisbane, Australia, pp. 4435-4439.
P. Motlicek, S.Dey, S. Madikeri, and L. Burget, "Employment of Subspace Gaussian Mixture Models in speaker recognition", in Proc. of ICASSP 2015, Brisbane, Australia, pp. 4445-4449.
I. Himawan, P. Motlicek, M. Ferras, S. Madikeri, "Towards utterance-based neural network adaptation in acoustic modeling", in Proc. of IEEE ASRU 2015.
S. Madikeri, and H. Bourlard ,"Filterbank slope based features for speaker diarization", in Proc. ICASSP 2014, Florence, Italy, pp. 111-115.
S. Madikeri, "A Hybrid Factor Analysis and Probabilistic PCA-based system for Dictionary Learning and Encoding for Robust Speaker Recognition", In Odyssey 2012-The Speaker and Language Recognition Workshop [pdf].
S. Madikeri and H. A. Murthy, "Mel Filter Bank energy-based Slope feature and its application to speaker recognition," Communications (NCC), 2011 National Conference on , vol., no., pp.1-4, 28-30 Jan. 2011 doi: 10.1109/NCC.2011.5734713
S. Madikeri, and H. A. Murthy, "Discriminative training of Gaussian mixture speaker models: A new approach," Communications (NCC), 2010 National Conference on , vol., no., pp.1-5, 29-31 Jan. 2010 doi: 10.1109/NCC.2010.5430204 (Best Paper Award in Signal Processing Track)

Code/Toolkits

Pkwrap: a pytorch wrapper for LF-MMI training in Kaldi arXiv
Multilingual LF-MMI training: sample recipe is available here
Standard i-vector implementation for Kaldi
IB diarization toolkit (in C++)

Professional Activities and Awards

Winner of the International Create Challenge 2017
Best paper award at NCC 2011 for the paper titled "Discriminative training of Gaussian mixture speaker models: A new approach" in the Signal Processing Track

Current Projects (selected)

REAL TIME NETWORK, TEXT, AND SPEAKER ANALYTICS FOR COMBATING ORGANIZED CRIME (ROXANNE): See here for a brief description.

Past Projects (selected)

SUMMARIZATION AND DOMAIN-ADAPTIVE RETRIEVAL OF INFORMATION ACROSS LANGUAGES (SARAL): See here for project description. Our work for this project involves building Automatic Speech Recognition systems for low-resources languages (Tagalog, Swahili, Somali, Lithuanian and Bulgarian, so far) using techniques such as multilingual training and semi-supervised learning.
Speaker Identification Integrated Project: EU project with 17 partners including LEAs (Law Enforcment Agencies). Our work focused on developing speaker identification engines, fusion modules to use metadata information from gender identification and accent identification engines.
DimHA: We worked on developing fast speaker diarization systems using the Information Bottleneck (IB) framework.

Tel: +41 27 721 7743
Office: 304-4
Contact