MediaParl

Mediaparl is a Swiss accented bilingual database containing recordings in both French and German as they are spoken in Switzerland.

Get Data


Description

The data were recorded at the Valais Parliament. Valais is a bi-lingual Swiss canton with many local accents and dialects. Therefore, the database contains data with high variability and is suitable to study multilingual, accented and non-native speech recognition as well as language identification and language switch detection.

The corpus is partitioned into training, development and test sets. Since we focus on bilingual (accented, non-native) speech, the test set (MediaParl-TST) contains all the speakers who speak in both languages. The remaining speakers (non-bilingual) have been randomly assigned to the training (MediaParl-TRN) and development sets (MediaParl-DEV) in a proportion of 9 to 1.

MediaParl-TRN contains 11,425 sentences (5,471 in French and 5,955 in German) spoken by 180 different speakers. MediaParl-DEV contains 1,525 sentences (646 in French and 879 in German) from 17 different speakers. MediaParl-TST contains 2,617 sentences (925 in french and 1692 in German) from 7 different speakers. Each speaker uses both languages but we assume that each speaker is naturally speaking more often in his mother tongue. Four speakers are native German speakers and three speakers native French speakers.

Reference

MediaParl: Bilingual mixed language accented speech database, David Imseng, Hervé Bourlard, Holger Caesar, Philip N. Garner, Gwénolé Lecorvé and Alexandre Nanchen, in: Proceedings of the 2012 IEEE Workshop on Spoken Language Technology, 2012"