Tilak Purohit and Barbara Ruvolo, contributing to Idiap’s AI for Life research program, won the 1st prize at the Lemanic Life Science Hackathon organized at EPFL at the end of April 2024. The team also included EPFL Life Science bachelor's students Alexandra Psaltis, Jia Xian Jennifer Shan, and Elise Boyer Their outcome is an AI-enabled user interface prototype to support the detection of depression via speech.
Speech & Audio Processing
The expertise of the group encompasses statistical automatic speech recognition—based on hidden Markov models, or hybrid systems exploiting connectionist approaches—, text-to-speech, and generic audio processing, covering sound source localization, microphone arrays, speaker diarization, audio indexing, very low bit-rate speech coding, and perceptual background noise analysis for telecommunication systems.
Group News
Every year, the Institute nominates two students for its internal awards. In 2022, the Paper Award goes to Alexandre Bittar, and the Student Award goes to Teguh Lembono. Congratulations!
Esau Villatoro, research associate at the Speech & Audio Processing group at Idiap, and his colleagues from the Mathematics Research Center (CIMAT) from Mexico have won first place in two competitions related to Natural Language Processing. The objective of these competitions is to improve important aspects of Mexican society such as tourism and communication.
Idiap researchers published a paper describing an approach to speech processing based on the properties of the human brain. Their method proved as efficient as the current standard, whilst conserving the advantage of energy efficiency. Moreover, their work is replicable thanks to open access software paving the way for future applications.
Neural networks are often among cited technologies when it comes to artificial intelligence latest exploits. The downside is that this technology can be demanding in terms of energy costs. Two Idiap researchers demonstrated that in certain cases you should rather go for classical maths rather than for the AI hype.
Group Job Openings
Our group is regularly posting job openings ranging from internships to researcher positions. To check the opportunities currently available or to submit a speculative applications use the link below.Our group is regularly posting job openings ranging from internships to researcher positions. To check the opportunities currently available or to submit a speculative applications use the link below.
Current Group Members
MAGIMAI DOSS, Mathew
(Senior Research Scientist)
- website
MOTLICEK, Petr
(Senior Research Scientist)
- website
GARNER, Philip
(Senior Research Scientist)
- website
VLASENKO, Bogdan
(Research Associate)
- website
VILLATORO TELLO, Esaú
(Research Associate)
- website
HOVSEPYAN, Sevada
(Research Associate)
- website
MURALIDHAR, Skanda
(Research Associate)
- website
CAROFILIS VASCO, Roberto Andrés
(Postdoctoral Researcher)
- website
RANGAPPA, Pradeep
(Postdoctoral Researcher)
- website
TORNAY, Sandrine
(Postdoctoral Researcher)
- website
HERMANN, Enno
(Postdoctoral Researcher)
- website
AKSTINAITE, Vita
(Postdoctoral Researcher)
- website
KULKARNI, Ajinkya (Vijay)
(Postdoctoral Researcher)
- website
SARKAR, Eklavya
(PhD Student / Research Assistant)
- website
THORBECKE (NIGMATULINA), Iuliia
(PhD Student / Research Assistant)
- website
EL HAJAL, Karl
(PhD Student / Research Assistant)
- website
PUROHIT, Tilak
(PhD Student / Research Assistant)
- website
CHEN, Haolin
(PhD Student / Research Assistant)
- website
HE, Mutian
(PhD Student / Research Assistant)
- website
BURDISSO, Sergio (Gastón)
(R&D / Research Assistant)
TARIGOPULA, Neha
(PhD Student / Research Assistant)
- website
PRASAD, Amrutha
(PhD Student / Research Assistant)
- website
KUMAR, Shashi
(PhD Student / Research Assistant)
- website
SANCHEZ-CORTES, Dairazalia
(Research Assistant)
- website
COPPIETERS DE GIBSON, Louise
(PhD Student / Research Assistant)
- website
MOSTAANI, Zohreh
(PhD Student / Research Assistant)
- website
KHALIL, Driss
(Junior R&D / Research Assistant)
- website
SANCHEZ LARA, Alejandra
(Research Intern)
- website
SYLA, Valmir
(Research Intern)
ZHANG, Alice (Chi)
(Visitor)
- website
Alumni
Please note that this list is not exhaustive.
- ABROL, Vinayak
- AICHINGER, Ida
- AJMERA, Jitendra
- ANTONELLO, Niccolò
- ARADILLA ZAPATA, Guillermo
- ATHINEOS, Marios
- BABY, Deepak
- BAHAADINI, Sara
- BARBER, David
- BENZEGHIBA, Mohamed (Faouzi)
- BORNET, Annie
- BOURLARD, Hervé
- CANDY, Romain
- CEREKOVIC, Aleksandra
- CEVHER, Volkan
- CHAVARRIAGA, Ricardo
- CHU, Dong
- COLLADO, Thierry
- CRITTIN, Frank
- DELEZE, Maxime
- DEY, Subhadeep
- DIGHE, Pranay
- DINES, John
- DRYGAJLO, Andrzej
- DUFFNER, Stefan
- ELBANNA, Gasser
- ESPUÑA FONTCUBERTA, Aleix
- FABIEN, Maël
- FAJČÍK, Martin
- FRITSCH, Julian (David)
- GALAN MOLES, Ferran
- GOMEZ ALANIS, Alejandro
- GRANDVALET, Yves
- GRANGIER, David
- HAGEN, Astrid
- HAJIBABAEI, Mahdi
- HALPERN, Bence
- HE, Weipeng
- HERMANSKY, Hynek
- HONNET, Pierre-Edouard
- IKBAL, Shajith
- IMSENG, David
- IVANOVA, Maria
- JAIMES, Alejandro (Alex)
- JEANNINGROS, Loïc
- KETABDAR, Hamed
- KHODABAKHSHANDEH, Hamid
- KHONGLAH, Banriskhem (Kayang)
- KHOSRAVANI, Abbas
- KODRASI, Ina
- KRSTULOVIC, Sacha
- LATHOUD, Guillaume
- LAZARIDIS, Alexandros
- LI, Weifeng
- LINKE, Julian
- LOUPI, Dimitra
- MARIÉTHOZ, Johnny
- MARTINS, Renato
- MASSON, Olivier
- MAYORAZ, André
- MBANGA NDJOCK, Pierre (Armel)
- MCCOWAN, Iain
- MEIER, Corentin
- MENDOZA, Viviana
- MILLÁN, José del R.
- MOORE, Darren
- MORRIS, Andrew
- MOULIN, François
- MUCKENHIRN, Hannah
- NALLANTHIGHAL, Venkata Srikanth
- NATUREL, Xavier
- PARIDA, Shantipriya
- PARTHASARATHI, Sree Hari Krishnan
- PINTO, Francisco
- PITON, Timothy
- POCARD, Valentin
- POTARD, Blaise
- RAZAVI, Marzieh
- SAHA, Atreyee
- SALAMIN, Chloé
- SAMUI, Suman
- SANTOS REVILLA, Andrea Elena
- SARFJOO, Saeed (Seyyed)
- SEBASTIAN, Jilt
- SHAHNAWAZUDDIN, Syed
- SHAKAS, Alexis
- SHANKAR, Ravi
- SHARMA, Shivam
- SINGH, Muskaan
- SRINIVASAMURTHY, Ajay
- STEPHENSON, Todd
- STERPU, George
- SZASZAK, György
- TKACZUK, Jakub
- TONG, Sibo
- TRUSCELLO, Léonard
- TYAGI, Vivek
- ULLMANN, Raphael
- VALENTE, Fabio
- VASQUEZ-CORREA, Juan Camilo
Active Research Grants
- ELOQUENCE - ELOQUENCE: Multilingual and Cross-cultural interactions for context-aware, and bias-controlled dialogue systems for safety-critical applications
- EMIL - Emotion in the loop – a step towards a comprehensive closed-loop deep brain stimulation in Parkinson’s disease
- EUROCONTROL - Integrate the Automatic Speech Recognition system with eDEP, ESCAPE and audiolan
- EVOLANG-2 - Evolving Language Phase 2
- IICT - Inclusive Information and Communication Technologies
- PASS - PaSS - Pathological Speech Synthesis
- SMILE-II - SMILE-II Scalable Multimodal sign language technology for sIgn language Learning and assessmEnt Phase-II
- TRACY - A big-data analyTics from base-stations Registrations And Cdrs e-evidence sYstem
Past Research Grants
- AAMASSE - Acoustic Model Adaptation toward Spontaneous Speech and Environment
- ADDG2SU - Flexible Acoustic Data-Driven Grapheme to Subword Unit Conversion
- ADDG2SU_EXT - Flexible Acoustic data-driven Grapheme to Subword Unit Conversion
- AI4EU - A European AI On Demand Platform and Ecosystem
- AMIDA - Augmented Multi-party Interaction with Distance Access
- AMSP - Auditory-motivated signal processing and applications to robust speech enhancement and recognition
- ATCO2 - Automatic collection and processing of voice data from air-traffic communications
- BIOWATCH - Biowatch
- CLAS3 - Cross-Lingual Adaptation for Text to Speech Synthesis (CLAS3)
- CMM - Conversation Member Match
- COBALT - Content Based Call Filtering
- CRITERIA - Comprehensive data-driven Risk and Threat Assessment Methods for the Early and Reliable Identification, Validation and Analysis of migration-related risks
- DAUM - Domain Adaptation Using Sub-Space Models
- DAUM2012 - Domain Adaptation Using Sub-Space Models
- DBOX - D-Box: A generic dialog box for multilingual conversational applications
- DEEPSTD-EXT - Universal Spoken Term Detection with Deep Learning (extension)
- DEVEL-IA - Formation « Développeurs spécialisés en Intelligence Artificielle » selon le modèle de formation continue duale postgrade
- DIMHA - Diarizing Massive Amounts of Heterogeneous Audio
- DM3 - Distributed MultiModal Media server, a low cost large capacity high throughput data storage system
- ELEARNING-VALAIS_3.0 - eLearning-Valais 3.0
- EMIME - Effective Multilingual Interaction in Mobile Environments
- EPOC - A personalized speech recognition framework for audio messaging on the edge
- ESGEM - Enhanced Swiss German mEdia Monitoring
- EVOLANG - Evolving Language
- FLEXASR - Flexible Grapheme-Based Automatic Speech Recognition
- FLOSS - Flexible Linguistically-guided Objective Speech aSessment
- GENEEMO - Geneemo: An Expressive Audio Content Generation Tool
- HAAWAII - Highly Automated Air Traffic Controller Workstations with Artificial Intelligence Integration
- ICS-2010 - Interactive Cognitive Systems
- IM2-3 - Interactive Multimodal Information Management Phase 3
- INEVENT - Accessing Dynamic Networked Multimedia Events
- JOGGL - jöggl (töggl for juristic applications)
- MALORCA - Machine Learning of Speech Recognition Models for Controller Assistance
- MEGANEPRO - Myo-Electricity, Gaze and Artificial Intelligence for Neurocognitive Examination and Prosthetics
- MOSPEEDI - MoSpeeDi. Motor Speech Disorders: characterizing phonetic speech planning and motor speech programming/execution and their impairments
- MPM - Multimodal People Monitoring
- MULTI08 - Multimodal Interaction and Multimedia Data Mining
- MULTI08EXT - Multimodal Interaction and Multimedia Data Mining
- MULTIVEO - High Accuracy Speaker-Independent Multilingual Automatic Speech Recognition System
- MUMMER - MultiModal Mall Entertainment Robot
- NEWSONAI - Scientific voice to latest news of AI technology 2022
- PANDA - Perceptual Background Noise Analysis for the Newest Generation of Telecommunication Systems
- PHASER - PHASER: Parsimonious Hierarchical Automatic Speech Recognition
- PHASER-QUAD - Parsimonious Hierarchical Automatic Speech Recognition and Query Detection
- REAPPS - Reinforced audio processing via physiological signals
- RECAPP - Making speech technology accessible to Swiss people
- ROCKIT - Roadmap for Conversational Interaction Technologies
- RODI - Role based speaker diarization
- ROXANNE - Real time network, text, and speaker analytics for combating organized crime
- SARAL - Summarization and domain-Adaptive Retrieval of Information Across Languages
- SCALE - Speech Communication with Adaptive Learning
- SCOREL2 - Automatic scoring and adaptive pedagogy for oral language learning
- SESAME - SEarching Swiss Audio MEmories
- SHAPED - SHAPED: Speech Hybrid Analytics Platform for consumer and Enterprise Devices
- SHISSM - Sparse and hierarchical Structures for Speech Modeling
- SIIP - Speaker Identification Integrated Project
- SMILE - Scalable Multimodal sign language Technology for sIgn language Learning and assessmEnt
- STARFISH - STARFISH: Safety and Speech Recognition with Artificial Intelligence in the Use of Air Traffic Control
- SUMMA - Scalable Understanding of Multilingual Media
- TA2 - Together Anywhere, Together Anytime
- TA2-EEU - Together Anywhere, Together Anytime - Enlarged European Union
- TAO-CSR - Task Adaptation and Optimisation for Conversational Speech Recognition
- TAPAS - Training Network on Automatic Processing of PAthological Speech
- TIPS - Towards Integrated processing of Physiological and Speech signals
- UNITS - Unified Speech Processing Framework for Trustworthy Speaker Recognition
- VEOVOX - VeoVox: Voice-Controlled Order-Taking System for Restaurants
- WAVE2-96 - H2020-SESAR-PJ.10-W2-Solution 96