DLR-Logo -> http://www.dlr.de
DLR Portal Home | Imprint | Privacy Policy | Contact | Deutsch
Fontsize: [-] Text [+]

Easy Adaptation of Speech Recognition to Different Air Traffic Control Environments using the DeepSpeech Engine

Kleinert, Matthias and Venkatarathinam, Narasimman and Helmke, Hartmut and Ohneiser, Oliver and Strake, Maximilian and Fingscheidt, Tim (2021) Easy Adaptation of Speech Recognition to Different Air Traffic Control Environments using the DeepSpeech Engine. 11th SESAR Innovation Days, 2021-12-07 - 2021-12-09, Virtual.

Full text not available from this repository.


Nowadays, recognizing and understanding human speech is quite popular through systems like Alexa®, the Google Assistant or Siri®. Speech also plays a major role in air traffic control (ATC) as voice communication between air traffic controllers (ATCos) and pilots is essential for ensuring safe and efficient air traffic. This communication is still analogue and ATCos are forced to enter the same communication content again into digital systems with additional input devices. Automatic speech recognition (ASR) is a solution to automate this digitization process and an important necessity in optimizing ATCo workflow. This paper investigates the applicability of DeepSpeech, an open source, easy to adapt, end-to-end speech recognition engine from the Mozilla Corporation, as a speech-to-text solution for ATC speech. Different training approaches such as training a model from scratch and adapting a model pre-trained on non-ATC speech are explored. Model adaptation is performed by employing techniques such as fine-tuning, transfer learning, and layer freezing. Furthermore, the effect of employing an additional language model in conjunction with the end-to-end trained model is evaluated and shown to lead to a considerable relative improvement of 61% in word error rate. Overall, a word error rate of 6.0% is achieved on voice recordings from operational and simulation environment of different airspaces, resulting in command recognition rates between 85% and 97%. The achieved results show that DeepSpeech is a highly relevant solution for ATC-speech, especially when considering that it includes easy to use adaptation mechanisms also for non-experts in speech recognition.

Item URL in elib:https://elib.dlr.de/145397/
Document Type:Conference or Workshop Item (Speech)
Title:Easy Adaptation of Speech Recognition to Different Air Traffic Control Environments using the DeepSpeech Engine
AuthorsInstitution or Email of AuthorsAuthor's ORCID iDORCID Put Code
Kleinert, MatthiasUNSPECIFIEDhttps://orcid.org/0000-0002-0782-4147UNSPECIFIED
Helmke, HartmutUNSPECIFIEDhttps://orcid.org/0000-0002-1939-0200UNSPECIFIED
Ohneiser, OliverUNSPECIFIEDhttps://orcid.org/0000-0002-5411-691XUNSPECIFIED
Refereed publication:Yes
Open Access:No
Gold Open Access:No
In ISI Web of Science:No
Keywords:automatic speech recognition; ASR; air traffic control; ATC; DeepSpeech; ontology; domain adaptation
Event Title:11th SESAR Innovation Days
Event Location:Virtual
Event Type:international Conference
Event Start Date:7 December 2021
Event End Date:9 December 2021
Organizer:SESAR Joint Undertaking
HGF - Research field:Aeronautics, Space and Transport
HGF - Program:Aeronautics
HGF - Program Themes:other
DLR - Research area:Aeronautics
DLR - Program:L - no assignment
DLR - Research theme (Project):L - Managementaufgaben Luftfahrt
Location: Braunschweig
Institutes and Institutions:Institute of Flight Guidance > Controller Assistance
Deposited By: Kleinert, Matthias
Deposited On:13 Dec 2021 09:44
Last Modified:24 Apr 2024 20:44

Repository Staff Only: item control page

Help & Contact
electronic library is running on EPrints 3.3.12
Website and database design: Copyright © German Aerospace Center (DLR). All rights reserved.