Integrating Domain Knowledge into Transformer-based Approaches to Vulnerability Detection

Jeong, Seunghee (2023) Integrating Domain Knowledge into Transformer-based Approaches to Vulnerability Detection. Masterarbeit, Ludwig-Maximilians-Universität München.

PDF
6MB

Kurzfassung

The field of vulnerability detection in cybersecurity is critical for ensuring the security and integrity of software systems. Traditional methods like Static Application Security Testing (SAST) and Dynamic Application Security Testing (DAST) have limitations. SAST, while effective in identifying vulnerabilities early in the development cycle, often produces a high rate of false positives and struggles to understand the runtime context. DAST, on the other hand, can detect vulnerabilities in a running application but is limited by its inability to access the source code and its late detection in the software lifecycle. In contrast, the landscape of vulnerability detection has evolved significantly, embracing advanced machine learning models. Initially, the focus was on Recurrent Neural Network (RNN)-based models such as LSTM, BiLSTM, and BiGRU, along with their variants in Convolutional Neural Network (CNN)-based methodologies. However, the field has recently shifted towards transformer-based models, noted for their exceptional performance in natural language processing tasks and their proficiency in interpreting programming languages. This study leverages the strengths of transformer-based models, particularly those tailored for programming languages, to enhance vulnerability detection. By integrating domain knowledge, specifically the Common Weakness Enumeration (CWE) hierarchy, into programming languagespecific Transformer-based models. In this study, we investigate the efficacy of transformer-based models through two distinct classification approaches: standard classification and hierarchical classification using a deep classifier. Our primary objective is to assess the impact of integrating domain knowledge, particularly in the context of hierarchical methods, on model performance. This exploration aims to delineate how such integration influences outcomes compared to traditional classification methods, thereby providing insights into the potential advantages of domain-specific enhancements in transformer-based models by adding a novel dimension to the semantic and syntactic analysis of source code. Our hierarchical approach using various loss weights outperformed the standard classification with Focal Loss in multiclass classification. Also, these approaches showed high performances in binary classification even though the models were fine-tuned for multiclass classification task and not for binary classification task. This represents our approaches enable broader learning of semantic and synthetic knowledge in vulnerability detection tasks using transformer-based models and suggests promising direction for future research and application in the field.

elib-URL des Eintrags:

https://elib.dlr.de/201141/

Dokumentart:

Hochschulschrift (Masterarbeit)

Titel:

Integrating Domain Knowledge into Transformer-based Approaches to Vulnerability Detection

Autoren:

Autoren	Institution oder E-Mail-Adresse	Autoren-ORCID-iD	ORCID Put Code
Jeong, Seunghee	seunghee.jeong (at) dlr.de	NICHT SPEZIFIZIERT	NICHT SPEZIFIZIERT

Datum:

30 November 2023

Referierte Publikation:

Nein

Open Access:

Gold Open Access:

Nein

In SCOPUS:

Nein

In ISI Web of Science:

Nein

Seitenanzahl:

Status:

veröffentlicht

Stichwörter:

Vulnerability Detection; Machine Learning

Institution:

Ludwig-Maximilians-Universität München

Abteilung:

Fakultät für Mathematik, Informatik und Statistik

HGF - Forschungsbereich:

Luftfahrt, Raumfahrt und Verkehr

HGF - Programm:

Raumfahrt

HGF - Programmthema:

Technik für Raumfahrtsysteme

DLR - Schwerpunkt:

Raumfahrt

DLR - Forschungsgebiet:

R SY - Technik für Raumfahrtsysteme

DLR - Teilgebiet (Projekt, Vorhaben):

R - Intelligente Analysen und Methoden zur sicheren Softwareentwicklung

Standort:

Jena

Institute & Einrichtungen:

Institut für Datenwissenschaften > Datengewinnung und -mobilisierung

Hinterlegt von:

Brust, Dr. Clemens-Alexander

Hinterlegt am:

22 Dez 2023 08:26

Letzte Änderung:

03 Jan 2024 13:25

Nur für Mitarbeiter des Archivs: Kontrollseite des Eintrags