Flexible Acoustic Data-Driven Grapheme to Subword Unit Conversion

Current state-of-the-art automatic speech recognition (ASR) systems commonly use hidden Markov models (HMMs), where phonemes (phones) are assumed to be the intermediate subword units and each word to be recognized is explicitly modeled as a sequence of phonemes. Thus, despite availability of sophisticated statistical modeling or machine learning techniques, to develop an ASR system one requires prior knowledge, such as lexical resources (e.g., phoneme set, lexicon) and some minimum phonetic expertise. The lexicon in ASR system contains phonetic transcription of each word. One of the key aspect in lexicon development is learning the relation between graphemes/alphabets and phonemes. Often this is done by applying statistical methods such as, decision trees, conditional random fields which invariably rely on the availability of an initial lexicon that contains good quality pronunciations. Major languages such as, English, French, German, Spanish have well developed lexical resources. However, there are minority languages, such as Scottish Gaelic, Afrikaans that do not have such well developed lexical resources. Thus, development of ASR systems have mainly focussed towards major languages. Recently, at Idiap we have developed a novel approach which, with the aid of new statistical models, learns/captures probabilistic relation between graphemes and phonemes through acoustic data. This has opened up multiple opportunities for further development and research. For instance, this approach allows the possibility to exploit both lexical and acoustic resources from one or multiple languages to develop lexical resources for another language. In addition, it allows the possibility to develop an ASR system where instead of phonemes units automatically derived from acoustic data are used as subword units. Such systems are of utmost interest to all languages for rapid development and deployment of ASR systems. The goal of the present project is to exploit the novel approach to a) develop a framework for flexible development of lexical resources for both major and minority languages, and b) develop an ASR system that overcomes the need for linguistically motivated subword units (i.e., phonemes) or prior lexical resources, while yielding state-of-the-art performance.
Application Area - Human Machine Interaction, Perceptive and Cognitive Systems
Idiap Research Institute
Hasler Foundation
Mar 01, 2013
Mar 01, 2016