Evaluating ASR improvements for digital documentation in laboratories via fine-tuning, semantics, and post-Processing

Gokhale, Manasi und Hassan, Teena Chakkalayil und Houben, Sebastian (2026) Evaluating ASR improvements for digital documentation in laboratories via fine-tuning, semantics, and post-Processing. Masterarbeit, Hochschule Bonn-Rhein-Sieg.

Dieses Archiv kann nicht den Volltext zur Verfügung stellen.

Kurzfassung

Automatic Speech Recognition (ASR) models often struggle with domain-specific tasks because they have limited knowledge of the terminologies and contexts unique to those domains. Therefore, improving performance in such domains requires adapting ASR models using relevant supervised data. Such data is not readily available and can be time-consuming to produce.

This thesis examines the effective adaptation of ASR models for the chemistry and scientific laboratory domain. To address data scarcity, a domain-specific dataset was created, consisting of both real and synthetic audio samples for training, as well as a separate evaluation set. Several ASR models, including Vosk, Wav2Vec 2, SpeechT5, and Whisper, were evaluated to identify the most suitable baseline model for further adaptation. The Whisper-large-v2 model demonstrated the strongest performance and was selected for subsequent improvement.

Two complementary adaptation strategies were explored. One was fine-tuning the Whisper model on domain-specific data, and another was post-processing ASR outputs using Large Language Models (LLMs). Fine-tuning provided modest performance gains, while a dedicated LLM-based correction pipeline, which was enhanced with terminology derived from domain ontologies, yielded substantial improvements in transcription accuracy and contextual consistency.

Overall, the thesis contributes (i) a domain-specific dataset, (ii) a comprehensive analysis of ASR models, and (iii) effective strategies for adapting ASR systems to specialized scientific domains. These findings highlight practical pathways for improving ASR performance in specialized domains.

elib-URL des Eintrags:

https://elib.dlr.de/223736/

Dokumentart:

Hochschulschrift (Masterarbeit)

Titel:

Evaluating ASR improvements for digital documentation in laboratories via fine-tuning, semantics, and post-Processing

Autoren:

Autoren	Institution oder E-Mail-Adresse	Autoren-ORCID-iD	ORCID Put Code
Gokhale, Manasi	manasi.gokhale (at) dlr.de	https://orcid.org/0009-0002-6729-8107	NICHT SPEZIFIZIERT
Hassan, Teena Chakkalayil	teena.hassan (at) h-brs.de	NICHT SPEZIFIZIERT	NICHT SPEZIFIZIERT
Houben, Sebastian	sebastian.houben (at) h-brs.de	NICHT SPEZIFIZIERT	NICHT SPEZIFIZIERT

DLR-Supervisor:

Beitragsart	DLR-Supervisor	Institution oder E-Mail-Adresse	DLR-Supervisor-ORCID-iD
Thesis advisor	Dembska, Marta	Marta.Dembska (at) dlr.de	https://orcid.org/0000-0002-8180-1525

Datum:

2026

Open Access:

Nein

Seitenanzahl:

Status:

eingereichter Beitrag

Stichwörter:

Automatic Speech Recognition (ASR), Domain ontologies, Post-processing correction, Speech transcription

Institution:

Hochschule Bonn-Rhein-Sieg

Abteilung:

Computer Science

HGF - Forschungsbereich:

Luftfahrt, Raumfahrt und Verkehr

HGF - Programm:

Luftfahrt

HGF - Programmthema:

keine Zuordnung

DLR - Schwerpunkt:

Luftfahrt

DLR - Forschungsgebiet:

L - keine Zuordnung

DLR - Teilgebiet (Projekt, Vorhaben):

L - keine Zuordnung

Standort:

Jena

Institute & Einrichtungen:

Institut für Datenwissenschaften > Datenmanagement und -aufbereitung

Hinterlegt von:

Gokhale, Manasi

Hinterlegt am:

13 Apr 2026 16:21

Letzte Änderung:

13 Apr 2026 16:21

Nur für Mitarbeiter des Archivs: Kontrollseite des Eintrags