PaSS - Pathological Speech Synthesis

Speech communication involves production of acoustics through movement of articulators (e.g., tongue, jaw) and its perception in terms of different information such as, message, speaker identity, emotional state of person. Speech synthesis systems aim to produce speech that is natural and intelligible like human speech. With the advances in deep learning, today, speech synthesis systems have the capability to match natural human speech in terms of quality. Having said that, speech synthesis research has largely focused on synthesis of typical healthy control speech (adult speech without impairments). There are an increasing number of people across the world with debilitating speech pathologies (e.g., Parkinson's, Laryngeal cancer etc) that leads to breakdown of the synergy between speech production and speech perception phenomena. This in turn leads to issues related to daily life speech communication and quality of life. Thus, there is need for building assistive technologies that can aid in providing better treatment and care, and make the affected people more socially inclusive.The goal of the proposed project PaSS is to develop a framework for pathological speech synthesis that is clinically valid/acceptable with the aim (a) to address pathological speech data scarcity issues for pathological speech technology development, (b) to develop tools that could potentially be used for designing care pathway for treatment of patients with speech impairments, and (c) to further our understanding of pathological speech. The proposed research is organized into two research themes (RTs), while taking into account the challenges related to data scarcity, systematic evaluation of synthesized pathological speech and interpretability/explainability of the developed systems:RT1 - Pathological speech synthesis with objective speech assessment in the loop. Leveraging from recent advances in few shot speech synthesis, this RT will develop a closed-loop pathological speech synthesis system that is driven by objective speech assessment and validate the developed systems by putting the clinicians in the loop RT2 - Linking pathological speech synthesis to speech production. This RT will build upon RT1 to explain the developed pathological speech synthesis approaches in terms of physiological aspects of speech production through acoustic-to-articulatory inversion and articulatory speech synthesis models.PaSS will develop and validate neural speech synthesis and voice conversion approaches for different pathological conditions, namely, dysarthria, Parkinson's disease (PD), and treatment of laryngeal and hypopharyngeal cancer. The research and development will build new synergies in collaboration with GITA Lab, University of Antioquia, Colombia for PD progression and monitoring and the University Cancer Center Inselspital, Bern for the research on treatment of laryngeal and hypopharyngeal cancer. The proposed R&D lies at the intersection of speech synthesis, speech assessment, machine learning (including deep learning), speech signal processing and clinical speech science. PaSS requests funding for one PhD student on full time basis and a postdoctoral researcher on 50% basis for a period of four years.
Idiap Research Institute
SNSF (CH)
Mar 01, 2024
Feb 29, 2028