Jeong, Seunghee (2023) Integrating Domain Knowledge into Transformer-based Approaches to Vulnerability Detection. Masterarbeit, Ludwig-Maximilians-Universität München.
PDF
6MB |
Kurzfassung
The field of vulnerability detection in cybersecurity is critical for ensuring the security and integrity of software systems. Traditional methods like Static Application Security Testing (SAST) and Dynamic Application Security Testing (DAST) have limitations. SAST, while effective in identifying vulnerabilities early in the development cycle, often produces a high rate of false positives and struggles to understand the runtime context. DAST, on the other hand, can detect vulnerabilities in a running application but is limited by its inability to access the source code and its late detection in the software lifecycle. In contrast, the landscape of vulnerability detection has evolved significantly, embracing advanced machine learning models. Initially, the focus was on Recurrent Neural Network (RNN)-based models such as LSTM, BiLSTM, and BiGRU, along with their variants in Convolutional Neural Network (CNN)-based methodologies. However, the field has recently shifted towards transformer-based models, noted for their exceptional performance in natural language processing tasks and their proficiency in interpreting programming languages. This study leverages the strengths of transformer-based models, particularly those tailored for programming languages, to enhance vulnerability detection. By integrating domain knowledge, specifically the Common Weakness Enumeration (CWE) hierarchy, into programming languagespecific Transformer-based models. In this study, we investigate the efficacy of transformer-based models through two distinct classification approaches: standard classification and hierarchical classification using a deep classifier. Our primary objective is to assess the impact of integrating domain knowledge, particularly in the context of hierarchical methods, on model performance. This exploration aims to delineate how such integration influences outcomes compared to traditional classification methods, thereby providing insights into the potential advantages of domain-specific enhancements in transformer-based models by adding a novel dimension to the semantic and syntactic analysis of source code. Our hierarchical approach using various loss weights outperformed the standard classification with Focal Loss in multiclass classification. Also, these approaches showed high performances in binary classification even though the models were fine-tuned for multiclass classification task and not for binary classification task. This represents our approaches enable broader learning of semantic and synthetic knowledge in vulnerability detection tasks using transformer-based models and suggests promising direction for future research and application in the field.
elib-URL des Eintrags: | https://elib.dlr.de/201141/ | ||||||||
---|---|---|---|---|---|---|---|---|---|
Dokumentart: | Hochschulschrift (Masterarbeit) | ||||||||
Titel: | Integrating Domain Knowledge into Transformer-based Approaches to Vulnerability Detection | ||||||||
Autoren: |
| ||||||||
Datum: | 30 November 2023 | ||||||||
Referierte Publikation: | Nein | ||||||||
Open Access: | Ja | ||||||||
Gold Open Access: | Nein | ||||||||
In SCOPUS: | Nein | ||||||||
In ISI Web of Science: | Nein | ||||||||
Seitenanzahl: | 81 | ||||||||
Status: | veröffentlicht | ||||||||
Stichwörter: | Vulnerability Detection; Machine Learning | ||||||||
Institution: | Ludwig-Maximilians-Universität München | ||||||||
Abteilung: | Fakultät für Mathematik, Informatik und Statistik | ||||||||
HGF - Forschungsbereich: | Luftfahrt, Raumfahrt und Verkehr | ||||||||
HGF - Programm: | Raumfahrt | ||||||||
HGF - Programmthema: | Technik für Raumfahrtsysteme | ||||||||
DLR - Schwerpunkt: | Raumfahrt | ||||||||
DLR - Forschungsgebiet: | R SY - Technik für Raumfahrtsysteme | ||||||||
DLR - Teilgebiet (Projekt, Vorhaben): | R - Intelligente Analysen und Methoden zur sicheren Softwareentwicklung | ||||||||
Standort: | Jena | ||||||||
Institute & Einrichtungen: | Institut für Datenwissenschaften > Datengewinnung und -mobilisierung | ||||||||
Hinterlegt von: | Brust, Dr. Clemens-Alexander | ||||||||
Hinterlegt am: | 22 Dez 2023 08:26 | ||||||||
Letzte Änderung: | 03 Jan 2024 13:25 |
Nur für Mitarbeiter des Archivs: Kontrollseite des Eintrags