Vocal-tract based Fast Adaptation for Speech Technology

Recent advances in statistical text to speech synthesis (TTS) have enabled voice personalisation via the adaptation techniques normally associated with automatic speech recognition (ASR). Such techniques allow a synthesis voice to match a given voice using a short sample of the given voice. The desire for fast adaptation in TTS has prompted research in use of vocal tract length normalisation (VTLN) techniques that require very little adaptation data because of the small number of parameters. Success with such techniques suggests that they might be applied back again in ASR. This proposal is for a short project to evaluate the use of VTLN techniques as a prior for the more usual adaptation techniques in both ASR and TTS. It should result in fast and natural sounding adaptation that smoothly reacts to increasing amounts of adaptation data.
Perceptive and Cognitive Systems
Idiap Research Institute
Hasler Foundation
Mar 15, 2012
Dec 15, 2012