elib
DLR-Header
DLR-Logo -> http://www.dlr.de
DLR Portal Home | Imprint | Privacy Policy | Accessibility | Contact | Deutsch
Fontsize: [-] Text [+]

Grammar Based Speaker Role Identification for Air Traffic Control Speech Recognition

Prasad, Amrutha and Zuluaga-Gomez, Juan Pablo and Motlicek, Petr and Sarfjoo, Saeed and Nigmatulina, Iuliia and Ohneiser, Oliver and Helmke, Hartmut (2022) Grammar Based Speaker Role Identification for Air Traffic Control Speech Recognition. In: SESAR Innovation Days 2022. SESAR Innovation Days 2022, 2022-12-05 - 2022-12-08, Budapest, Ungarn.

[img] PDF
568kB

Abstract

Automatic Speech Recognition (ASR) for air traffic control is generally trained by pooling Air Traffic Controller (ATCO) and pilot data. In practice, this is motivated by the proportion of annotated data from pilots being less than ATCO’s. However, due to the data imbalance of ATCO and pilot and their varying acoustic conditions, the ASR performance is usually significantly better for ATCOs speech than pilots. Obtaining the speaker roles requires manual effort when the voice recordings are collected using Very High Frequency (VHF) receivers and the data is noisy and in a single channel without the push-totalk (PTT) signal. In this paper, we propose to (1) split the ATCO and pilot data using an intuitive approach exploiting ASR transcripts and (2) consider ATCO and pilot ASR as two separate tasks for Acoustic Model (AM) training. The paper focuses on applying this approach to noisy data collected using VHF receivers, as this data is helpful for training despite its noisy nature. We also developed a simple yet efficient knowledgebased system for speaker role classification based on grammar defined by the International Civil Aviation Organization (ICAO). Our system accepts as input text, thus, either gold annotations or transcripts generated by an ABSR system. This approach provides an average accuracy in speaker role identification of 83%. Finally, we show that training AMs separately for each task, or using a multitask approach, is well suited for the noisy data compared to the traditional ASR system, where all data is pooled together for AM training.

Item URL in elib:https://elib.dlr.de/189422/
Document Type:Conference or Workshop Item (Speech)
Title:Grammar Based Speaker Role Identification for Air Traffic Control Speech Recognition
Authors:
AuthorsInstitution or Email of AuthorsAuthor's ORCID iDORCID Put Code
Prasad, AmruthaIdiap, BUTUNSPECIFIEDUNSPECIFIED
Zuluaga-Gomez, Juan PabloIdiap, EPFLUNSPECIFIEDUNSPECIFIED
Motlicek, PetrIdiap, BUTUNSPECIFIEDUNSPECIFIED
Sarfjoo, SaeedIdiapUNSPECIFIEDUNSPECIFIED
Nigmatulina, IuliiaIdiap, University of ZurichUNSPECIFIEDUNSPECIFIED
Ohneiser, OliverUNSPECIFIEDhttps://orcid.org/0000-0002-5411-691XUNSPECIFIED
Helmke, HartmutUNSPECIFIEDhttps://orcid.org/0000-0002-1939-0200UNSPECIFIED
Date:2022
Journal or Publication Title:SESAR Innovation Days 2022
Refereed publication:Yes
Open Access:Yes
Gold Open Access:No
In SCOPUS:No
In ISI Web of Science:No
Status:Published
Keywords:assistant based speech recognition, air traffic management, multitask acoustic modeling, speaker role classification, Kaldi
Event Title:SESAR Innovation Days 2022
Event Location:Budapest, Ungarn
Event Type:international Conference
Event Start Date:5 December 2022
Event End Date:8 December 2022
HGF - Research field:Aeronautics, Space and Transport
HGF - Program:Aeronautics
HGF - Program Themes:Air Transportation and Impact
DLR - Research area:Aeronautics
DLR - Program:L AI - Air Transportation and Impact
DLR - Research theme (Project):L - Integrated Flight Guidance
Location: Braunschweig
Institutes and Institutions:Institute of Flight Guidance > Controller Assistance
Deposited By: Diederich, Kerstin
Deposited On:19 Dec 2022 11:14
Last Modified:24 Apr 2024 20:50

Repository Staff Only: item control page

Browse
Search
Help & Contact
Information
OpenAIRE Validator logo electronic library is running on EPrints 3.3.12
Website and database design: Copyright © German Aerospace Center (DLR). All rights reserved.