Gokhale, Manasi und Hassan, Teena Chakkalayil und Houben, Sebastian (2026) Evaluating ASR improvements for digital documentation in laboratories via fine-tuning, semantics, and post-Processing. Masterarbeit, Hochschule Bonn-Rhein-Sieg.
Dieses Archiv kann nicht den Volltext zur Verfügung stellen.
Kurzfassung
Automatic Speech Recognition (ASR) models often struggle with domain-specific tasks because they have limited knowledge of the terminologies and contexts unique to those domains. Therefore, improving performance in such domains requires adapting ASR models using relevant supervised data. Such data is not readily available and can be time-consuming to produce.
This thesis examines the effective adaptation of ASR models for the chemistry and scientific laboratory domain. To address data scarcity, a domain-specific dataset was created, consisting of both real and
synthetic audio samples for training, as well as a separate evaluation set. Several ASR models, including Vosk, Wav2Vec 2, SpeechT5, and Whisper, were evaluated to identify the most suitable baseline model for further adaptation. The Whisper-large-v2 model demonstrated the strongest performance and was selected for subsequent improvement.
Two complementary adaptation strategies were explored. One was fine-tuning the Whisper model on domain-specific data, and another was post-processing ASR outputs using Large Language Models (LLMs). Fine-tuning provided modest performance gains, while a dedicated LLM-based correction pipeline, which was enhanced with terminology derived from domain ontologies, yielded substantial improvements in transcription accuracy and contextual consistency.
Overall, the thesis contributes (i) a domain-specific dataset, (ii) a comprehensive analysis of ASR models, and (iii) effective strategies for adapting ASR systems to specialized scientific domains. These findings highlight practical pathways for improving ASR performance in specialized domains.
| elib-URL des Eintrags: | https://elib.dlr.de/223736/ | ||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Dokumentart: | Hochschulschrift (Masterarbeit) | ||||||||||||||||
| Titel: | Evaluating ASR improvements for digital documentation in laboratories via fine-tuning, semantics, and post-Processing | ||||||||||||||||
| Autoren: |
| ||||||||||||||||
| DLR-Supervisor: |
| ||||||||||||||||
| Datum: | 2026 | ||||||||||||||||
| Open Access: | Nein | ||||||||||||||||
| Seitenanzahl: | 77 | ||||||||||||||||
| Status: | eingereichter Beitrag | ||||||||||||||||
| Stichwörter: | Automatic Speech Recognition (ASR), Domain ontologies, Post-processing correction, Speech transcription | ||||||||||||||||
| Institution: | Hochschule Bonn-Rhein-Sieg | ||||||||||||||||
| Abteilung: | Computer Science | ||||||||||||||||
| HGF - Forschungsbereich: | Luftfahrt, Raumfahrt und Verkehr | ||||||||||||||||
| HGF - Programm: | Luftfahrt | ||||||||||||||||
| HGF - Programmthema: | keine Zuordnung | ||||||||||||||||
| DLR - Schwerpunkt: | Luftfahrt | ||||||||||||||||
| DLR - Forschungsgebiet: | L - keine Zuordnung | ||||||||||||||||
| DLR - Teilgebiet (Projekt, Vorhaben): | L - keine Zuordnung | ||||||||||||||||
| Standort: | Jena | ||||||||||||||||
| Institute & Einrichtungen: | Institut für Datenwissenschaften > Datenmanagement und -aufbereitung | ||||||||||||||||
| Hinterlegt von: | Gokhale, Manasi | ||||||||||||||||
| Hinterlegt am: | 13 Apr 2026 16:21 | ||||||||||||||||
| Letzte Änderung: | 13 Apr 2026 16:21 |
Nur für Mitarbeiter des Archivs: Kontrollseite des Eintrags