This is a proposal in the area of acoustic modelling for automatic speech recognition (ASR). Current approaches to ASR are based on hidden Markov models (HMMs). Whilst HMMs can capture the time varying nature of speech, they are also well known for only capturing surface acoustic features rather than the underlying structure of speech. This leads to their inability to deal with changes in “domain”, where domain can mean, e.g., speaker, noise situation, accent, age or dialect. In order to adapt to domain, the concept of HMM adaptation has found considerable favour. All state of the art HMM systems now use some kind of domain adaptation, usually for noise and speaker.
Recently, in light of this situation, a new paradigm has emerged where the linear transformations associated with adaptation are an integral part of the HMM. The HMM is taken to be a canonical model of speech, and adaptation is then required. However, given the tighter integration of HMM and linear transforms, the overall model can be significantly optimised via parameter sharing, and can take more advantage of disparate data. One such model is the sub-space Gaussian mixture model (SGMM). The SGMM can be thought of as distinguishing a normal HMM-GMM system, and a set of linear transforms defining a subspace. The subspace is intended to represent the range of movement of the human vocal tract. The linear transforms then represent the domain. Many parameters are shared. In this project, we will take the SGMM as being representative of this class of model, and use it in the context of out-of-domain tasks. In the first phase of the project, we will build a baseline and apply it to English speaker and accent adaptation. In this case, the out-of-domain data is the variability of the accent. In the second phase, we will attempt to bring in more varied out-of-domain data, both for training and for adaptation. We hope also to adapt to dialect, either in English, or in, for instance Swiss German, which is heavily accented and has many dialect issues. We also hope to investigate the use of tree-structured hierarchies of acoustic models. Such hierarchies can be calculated easily in the SGMM framework, and lend themselves well to parallel, cloud-based architectures. We expect the project to yield an acoustic modelling paradigm that outperforms standard HMMs with adaptation, allows out-of-domain data to be used, and has fewer parameters than conventional models.