elib
DLR-Header
DLR-Logo -> http://www.dlr.de
DLR Portal Home | Imprint | Privacy Policy | Contact | Deutsch
Fontsize: [-] Text [+]

Integrating Domain Knowledge into Transformer-based Approaches to Vulnerability Detection

Jeong, Seunghee (2023) Integrating Domain Knowledge into Transformer-based Approaches to Vulnerability Detection. Master's, Ludwig-Maximilians-Universität München.

[img] PDF
6MB

Abstract

The field of vulnerability detection in cybersecurity is critical for ensuring the security and integrity of software systems. Traditional methods like Static Application Security Testing (SAST) and Dynamic Application Security Testing (DAST) have limitations. SAST, while effective in identifying vulnerabilities early in the development cycle, often produces a high rate of false positives and struggles to understand the runtime context. DAST, on the other hand, can detect vulnerabilities in a running application but is limited by its inability to access the source code and its late detection in the software lifecycle. In contrast, the landscape of vulnerability detection has evolved significantly, embracing advanced machine learning models. Initially, the focus was on Recurrent Neural Network (RNN)-based models such as LSTM, BiLSTM, and BiGRU, along with their variants in Convolutional Neural Network (CNN)-based methodologies. However, the field has recently shifted towards transformer-based models, noted for their exceptional performance in natural language processing tasks and their proficiency in interpreting programming languages. This study leverages the strengths of transformer-based models, particularly those tailored for programming languages, to enhance vulnerability detection. By integrating domain knowledge, specifically the Common Weakness Enumeration (CWE) hierarchy, into programming languagespecific Transformer-based models. In this study, we investigate the efficacy of transformer-based models through two distinct classification approaches: standard classification and hierarchical classification using a deep classifier. Our primary objective is to assess the impact of integrating domain knowledge, particularly in the context of hierarchical methods, on model performance. This exploration aims to delineate how such integration influences outcomes compared to traditional classification methods, thereby providing insights into the potential advantages of domain-specific enhancements in transformer-based models by adding a novel dimension to the semantic and syntactic analysis of source code. Our hierarchical approach using various loss weights outperformed the standard classification with Focal Loss in multiclass classification. Also, these approaches showed high performances in binary classification even though the models were fine-tuned for multiclass classification task and not for binary classification task. This represents our approaches enable broader learning of semantic and synthetic knowledge in vulnerability detection tasks using transformer-based models and suggests promising direction for future research and application in the field.

Item URL in elib:https://elib.dlr.de/201141/
Document Type:Thesis (Master's)
Title:Integrating Domain Knowledge into Transformer-based Approaches to Vulnerability Detection
Authors:
AuthorsInstitution or Email of AuthorsAuthor's ORCID iDORCID Put Code
Jeong, SeungheeUNSPECIFIEDUNSPECIFIEDUNSPECIFIED
Date:30 November 2023
Refereed publication:No
Open Access:Yes
Gold Open Access:No
In SCOPUS:No
In ISI Web of Science:No
Number of Pages:81
Status:Published
Keywords:Vulnerability Detection; Machine Learning
Institution:Ludwig-Maximilians-Universität München
Department:Fakultät für Mathematik, Informatik und Statistik
HGF - Research field:Aeronautics, Space and Transport
HGF - Program:Space
HGF - Program Themes:Space System Technology
DLR - Research area:Raumfahrt
DLR - Program:R SY - Space System Technology
DLR - Research theme (Project):R - Intelligent analysis and methods for safe software development
Location: Jena
Institutes and Institutions:Institute of Data Science > Data Acquisition and Mobilisation
Deposited By: Brust, Dr. Clemens-Alexander
Deposited On:22 Dec 2023 08:26
Last Modified:03 Jan 2024 13:25

Repository Staff Only: item control page

Browse
Search
Help & Contact
Information
electronic library is running on EPrints 3.3.12
Website and database design: Copyright © German Aerospace Center (DLR). All rights reserved.