elib
DLR-Header
DLR-Logo -> http://www.dlr.de
DLR Portal Home | Imprint | Privacy Policy | Accessibility | Contact | Deutsch
Fontsize: [-] Text [+]

Can YouTube Stream Recordings Improve Speech Recognition for Air Traffic Control?

Wüstenbecker, Niclas and Ohneiser, Oliver and Kleinert, Matthias (2026) Can YouTube Stream Recordings Improve Speech Recognition for Air Traffic Control? Journal of Open Aviation Science. TU Delft. doi: 10.59490/joas.2026.8477. ISSN 2773-1626.

[img] PDF - Published version
2MB

Abstract

Automatic speech recognition for air traffic control (ATC) faces severe training data scarcity due to operational recording restrictions and expensive domain-expert transcription requirements. We address this limitation by developing an automated pipeline that extracts large-scale, high-quality training data from publicly available YouTube streams of virtual ATC simulator sessions from networks such as VATSIM and IVAO. Our approach systematically processes over 2,000 hours of content spanning 709 videos from virtual airports and airspaces in 17 countries across multiple continents, operational domains (ground, tower, approach, en-route), and diverse speaker accents. The pipeline employs speaker diarization for utterance segmentation, parallel transcription using three complementary automatic speech recognition (ASR) architectures with distinct error characteristics, and Large Language Model-based transcript fusion that synthesizes improved pseudo-labels while filtering non-ATC content. Manual verification on a stratified 120-minute evaluation set demonstrates 10.2% word error rate for controller speech and 18.3% for pilot speech - representing 37% relative improvement over the best individual model and establishing pseudo-label quality sufficient for downstream model training. We show the feasibility of this approach by training a compact 115M-parameter ASR model exclusively on automatically generated transcripts without any manually annotated operational data. Evaluation on the operational ATCO2 benchmark reveals 21.1% word error rate compared to 35.6% for published baselines trained on smaller manually-transcribed datasets, despite the domain gap between virtual and operational ATC, while achieving approximately five times faster inference. These results demonstrate that large-scale geographically and acoustically diverse, pseudo-labeled data can effectively compensate for moderate label noise when training specialized-domain speech recognition systems. We openly release the complete processing pipeline, curated video collection, and our trained model to enable reproducible research.

Item URL in elib:https://elib.dlr.de/223725/
Document Type:Article
Title:Can YouTube Stream Recordings Improve Speech Recognition for Air Traffic Control?
Authors:
AuthorsInstitution or Email of AuthorsAuthor's ORCID iDORCID Put Code
Wüstenbecker, Niclasniclas.wuestenbecker (at) dlr.dehttps://orcid.org/0009-0000-3440-8635210395293
Ohneiser, OliverOliver.Ohneiser (at) dlr.dehttps://orcid.org/0000-0002-5411-691X210395294
Kleinert, MatthiasMatthias.Kleinert (at) dlr.dehttps://orcid.org/0000-0002-0782-4147UNSPECIFIED
Date:20 March 2026
Journal or Publication Title:Journal of Open Aviation Science
Refereed publication:No
Open Access:Yes
Gold Open Access:No
In SCOPUS:No
In ISI Web of Science:No
DOI:10.59490/joas.2026.8477
Publisher:TU Delft
ISSN:2773-1626
Status:Published
Keywords:Air Traffic Control, Automatic Speech Recognition, Public Dataset, Large Language Model
HGF - Research field:Aeronautics, Space and Transport
HGF - Program:Aeronautics
HGF - Program Themes:Air Transportation and Impact
DLR - Research area:Aeronautics
DLR - Program:L AI - Air Transportation and Impact
DLR - Research theme (Project):L - Integrated Flight Guidance
Location: Braunschweig
Institutes and Institutions:Institute of Flight Guidance > Controller Assistance
Deposited By: Wüstenbecker, Niclas
Deposited On:01 Apr 2026 11:16
Last Modified:01 Apr 2026 11:16

Repository Staff Only: item control page

Browse
Search
Help & Contact
Information
OpenAIRE Validator logo electronic library is running on EPrints 3.3.12
Website and database design: Copyright © German Aerospace Center (DLR). All rights reserved.