ASR challenge with code-switching in multilingual nations
The number of multilingual people in the world continues to grow. Code-switching (CS, sometimes also referred to as code-mixing) is the use of elements from more than one language in the same utterance, such as in the German-English `Für HEAVEN’S Willen!' (`For HEAVEN’S sake!') and is a vital and widespread form of language use in bilingual speakers. The production of code-switches can be influenced by the properties of the words and the spoken context, typically more technical or international, and by the speakers' relative proficiency in both languages. This phenomenon poses a significant challenge for automatic speech recognition (ASR) systems.
Switzerland is a multilingual nation with four recognized national languages. German (spoken by about 63% of the population), French (23%), Italian (8%) and Romansh (0.5%). This does not mean all Swiss are constantly flipping between several languages, although most of them regularly use more than one language either at home or at work.
Switzerland's linguistic communities are connected to specific territories; there exist towns with one official language where the neighboring town has a different one. Proficiency in the non-native national languages varies, but more and more English is heard in Switzerland owing not only to globalization and the presence of more foreign companies in the big cities but also because it is increasingly being used as a bridge between Swiss people from different linguistic regions. In recent years, English is becoming a standard foreign language taught in schools (along with one Swiss national language) in many cantons but it is still far from the only non-national language heard in Switzerland. If you open a newspaper, look at an advert or listen to a conversation between young(ish) people in German-speaking Switzerland, it won’t take long before you encounter an English word. The truth is you’re never far from an English word in Switzerland.
On a syntactic level, code-switching is divided into intra-sentential and inter-sentential units. Typical examples of intra-sentential switches are phrasal elements from the embedded language (English) that occur in a matrix language (German) sentence as in Berlin sei eben `the place to be', erklärt ein Banker (`Berlin is just the place to be, says a banker’). Inter-sentential code-switching, on the other hand, can be defined as grammatically complete English sentences which are added as non-obligatory clauses to a German sentence or occur outside the textual space of a German sentence, as in `When in Rome, do as the Romans do.' Dieses englische Sprichwort drückt eine Binsenweisheit aus (‘When in Rome, do as the Romans do’. This English proverb expresses a truism). Due to the larger acoustical variations of mixed languages within utterances, intra-sentential code-switching is much more difficult for an ASR system.
Unfortunately, due to the lack of available resources for English code-switching in German, there are very few studies available. We at Idiap, have developed a German-English code-switching evaluation set based on the German Spoken Wikipedia Corpus (SWC). SWC is a large collection of speech data read by volunteers covering a broad variety of Wikipedia topics. Owing to the encyclopedic nature of the articles and the diverse range of technical and scientific topics, they include a large amount of borrowing and code-switching phenomena. The word-level alignment provided in this corpus allows us to extract segments with intra-sentential code-switching and develop an evaluation scenario that can be used as a benchmark for research on code-switching speech recognition. The SWC is an open-source dataset and is licensed under CC-BY-SA 3.0. The code-switching benchmark is available in this link: Code-Switching Speech Corpus