Interspeech 2021 Special Session
Session Overview
Air-traffic management is a dedicated domain where in addition to using the voice signal, other contextual information (i.e. air traffic surveillance data, meteorological data, etc.) plays an important role. Automatic speech recognition is the first challenge in the whole chain. Further processing usually requires transforming the recognised word sequence into the conceptual form, a more important for application in ATM. This also means that the usual metrics for evaluating ASR systems (e.g. word error rate) are less important, and other performance criteria (i.e. objective such as command recognition error rate, callsign detection accuracy, overall algorithmic delay, or real-time factor, or subjective such as decrease of a workload of the users) are employed. Past works were mostly related to the speech recognition applied on air-traffic controllers. The industrial partners now also consider the processing of voice of pilots (i.e. with significantly increased level of noise) which can be useful for many applications, especially in the safety domain.
The air-traffic communication currently relies on two approaches, voice and voiceless communications through data links. Although voiceless communication may seem to be abandoned in the near future, it is opposite. The International Civil Aviation Organization (ICAO) assumes that in order to avoid distractions during critical phases of flights, “the controller should use voice to communicate with aircraft operating below 10,000 ft above ground level”; hence, the voice communication remains as the main way to exchange information and commands near the airports. The long-term objective is to transfer certain commands and orders throughout a human-machine interface, thus, reducing the amount of spoken communication while decreasing the controllers workload, or sketching potential savings (fuel, carbon dioxide) through better controller performance. Other examples for applications are aligned with simulation, workload estimation, readback error detection, etc.
The main objective of the proposed special session is to invite important ATM players (both academic and industrial) interested in ASR to review current ongoing works in this domain, and accelerate near future R&D plans to enable an integration of speech technologies to the challenging but highly secured air-traffic management.
Based on the above, we are strongly convinced that the topic is NOT appropriate for regular conference sessions such as acoustic or language modeling for ASR. Further, also the potential participants of this session are expected to come from different community/ies.
Depending on number of submitted and accepted papers for this session (assuming to reach the tradeoff of 50% as it is approximately the case of the past Interspeech conferences), we aim to organise a 2h long session with poster presentations (1hour), preceded by an Introductory talk given by the Organisers (15mins), and closed by open panel discussions and future plans (45mins).
Call for Papers:
We welcome the submission that work on, but not limited to, the following research directions:
- Automatic speech recognition for air-traffic communication
- Studies that use speech to further develop applications in ATM
- Combination of speech and other data (e.g. situational context) in ATM
- Supervised or semi-supervised training of acoustic and language models evaluated on ATC corpora
- Future trends in processing speech communication in ATM
Link:
https://www.idiap.ch/asr-atm-session.html
For more information email: petr.motlicek@idiap.ch
Session format:
A panel discussion followed by a poster session.
Organizing committee: (ordered alphabetically)
Hartmut Helmke
Pavel Kolcarek
Petr Motlicek
References
Delpech, E.; Laignelet, M.; Pimm, C.; Raynal, C.; Trzos, M.; Arnold, A.; Pronto, D. “A Real-life, French-accentedCorpus of Air Traffic Control Communications.” In Proceedings of the Eleventh International Conference onLanguage Resources and Evaluation (LREC 2018), Miyazaki, Japan, 7–12 May 2018.18.
J. Godfrey, “Air Traffic Control Complete LDC94S14A,”DVD, 1994. [Online]. Available: http://catalog.ldc.upenn.edu/LDC94S14A
H. Helmke, J. Rataj, T. Mühlhausen, O. Ohneiser, H. Ehr,M. Kleinert, Y. Oualil, M. Schulder, and D. Klakow, “Assistant-based speech recognition for ATM applications,” inProc. of 11thUSA/Europe Air Traffic Management Research and DevelopmentSeminar (ATM 2015), Jun. 2015.
H. Helmke, M. Slotty, M. Poiger, D. F. Herrer, O. Ohneiser et al.,“Ontology for transcription of ATC speech commands of SESAR 2020 solution PJ.16-04,” in IEEE/AIAA 37th Digital Avionics SystemsConference (DASC). London, United Kingdom, 2018
Hofbauer, K.; Petrik, S.; Hering, H. The ATCOSIM Corpus of Non-Prompted Clean Air Traffic Control Speech.In Proceedings of the LREC, Marrakech, Morocco, 26 May–1 June 2008. Available online: https://www.aclweb.org/anthology/L08-1507 (accessed on 30 November 2020).
Holone, H. “Possibilities, challenges and the state of the art of automatic speech recognition in air traffic control”, Int. J. Comput. Inf. Eng.2015,9, 1940–1949.
H. D. Kopald, A. Chanen, S. Chen, E. C. Smith, and R. M.Tarakan, “Applying automatic speech recognition technology to air traffic management,” in Proc. of the IEEE/AIAA 32nd DigitalAvionics Systems Conference (DASC), 2013.
T. Pellegrini, J. Farinas, E. Delpech, and F. Lancelot, “The Airbus Air Traffic Control Speech Recognition 2018 Challenge: Towards ATC Automatic Transcription and Call Sign Detection,” in Interspeech 2019, 2019, pp. 2993–2997.
Y. Oualil, M. Schulder, H. Helmke, A. Schmidt, and D. Klakow, “Real-time integration of dynamic context information for im-proving automatic speech recognition,” inProc. of INTERSPEECH, Dresden, Germany, Sep. 2015, pp. 2107–2111.
Y. Oualil, D. Klakow, G. Szaszák, A. Srinivasamurthy,H. Helmke, “A Context-aware Speech Recognition andUnderstanding System for Air Traffic Control Domain,” in IEEEAutomatic Speech Recognition and Understanding (ASRU) Work-shop, Okinawa, Japan, 2017.
Segura, J.; Ehrette, T.; Potamianos, A.; Fohr, D.; Illina, I.; Breton, P.; Clot, V.; Gemello, R.; Matassoni,M.; Maragos, P, “The HIWIRE Database, A Noisy and Non-Native English Speech Corpus for CockpitCommunication.” 2007. Available online: https://catalogue.elra.info/en-us/repository/browse/ELRA-S0293 (accessed on 30 November 2020).
D. Schäfer, “Context-sensitive speech recognition in the air traf-fic control simulation,” Ph.D. dissertation, University of ArmedForces, Munich, 2001.
A. Srinivasamurthy, P. Motlicek, M. Singh, Y. Oualil, M. Kleinert, H.Ehr and H. Helmke, “Iterative Learning of Speech Recognition Modelsfor Air Traffic Control,” in INTERSPEECH 2018, 19th Annual Conference of the International Speech Communication Association, Hyderabad, India, Sep. 2018.
Swail, C.; Benarousse, L.; Geoffrois, E.; Grieco, J.; Series, R.; Steeneken, H.; Stumpf, H.; Thiel, D.”The NATO Native and Non-Native (N4) Speech Corpus; 2003.” Available online: https://catalog.ldc.upenn.edu/LDC2006S13 (accessed on 30 November 2020).
Šmídl, L.; Švec, J.; Tihelka, D.; Matoušek, J.; Romportl, J.; Ircing, P. Air traffic control communication (ATCC) speech corpora and their use for ASR and TTS development.Lang. Resour. Eval. 2019,53, 449–464.
J. Zuluaga-Gomez, et al., “Automatic Speech Recognition Benchmark for Air-Traffic Communications”, in INTERSPEECH 2020, 21st Annual Conference of the International Speech Communication Association, Shanghai, China, 2020.