elib
DLR-Header
DLR-Logo -> http://www.dlr.de
DLR Portal Home | Imprint | Privacy Policy | Contact | Deutsch
Fontsize: [-] Text [+]

ROMEO: A Binary Vulnerability Detection Dataset for Exploring Juliet through the Lens of Assembly Language

Brust, Clemens-Alexander and Sonnekalb, Tim and Gruner, Bernd (2023) ROMEO: A Binary Vulnerability Detection Dataset for Exploring Juliet through the Lens of Assembly Language. Computers and Security, 128, e103165. Elsevier. doi: 10.1016/j.cose.2023.103165. ISSN 0167-4048.

[img] PDF - Preprint version (submitted draft)
214kB

Official URL: https://www.sciencedirect.com/science/article/pii/S0167404823000755

Abstract

Context Automatic vulnerability detection on C/C++ source code has benefitted from the introduction of machine learning to the field, with many recent publications targeting this combination. In contrast, assembly language or machine code artifacts receive less attention, although there are compelling reasons to study them. They are more representative of what is executed, more easily incorporated in dynamic analysis, and in the case of closed-source code, there is no alternative. Objective We evaluate the representative capability of assembly language compared to C/C++ source code for vulnerability detection. Furthermore, we investigate the role of call graph context in detecting function-spanning vulnerabilities. Finally, we verify whether compiling a benchmark dataset compromises an experiment's soundness by inadvertently leaking label information. Method We propose ROMEO, a publicly available, reproducible and reusable binary vulnerability detection benchmark dataset derived from the synthetic Juliet test suite. Alongside, we introduce a simple text-based assembly language representation that includes context for function-spanning vulnerability detection and semantics to detect high-level vulnerabilities. It is constructed by disassembling the .text segment of the respective binaries. Results We evaluate an x86 assembly language representation of the compiled dataset, combined with an off-the-shelf classifier. It compares favorably to state-of-the-art methods, including those operating on the full C/C++ code. Including context information using the call graph improves detection of function-spanning vulnerabilities. There is no label information leaked during the compilation process. Conclusion Performing vulnerability detection on a compiled program instead of the source code is a worthwhile tradeoff. While certain information is lost, e.g., comments and certain identifiers, other valuable information is gained, e.g., about compiler optimizations.

Item URL in elib:https://elib.dlr.de/194605/
Document Type:Article
Title:ROMEO: A Binary Vulnerability Detection Dataset for Exploring Juliet through the Lens of Assembly Language
Authors:
AuthorsInstitution or Email of AuthorsAuthor's ORCID iDORCID Put Code
Brust, Clemens-AlexanderUNSPECIFIEDhttps://orcid.org/0000-0001-5419-1998UNSPECIFIED
Sonnekalb, TimUNSPECIFIEDhttps://orcid.org/0000-0002-0067-1790UNSPECIFIED
Gruner, BerndUNSPECIFIEDhttps://orcid.org/0000-0002-4177-2993UNSPECIFIED
Date:2023
Journal or Publication Title:Computers and Security
Refereed publication:Yes
Open Access:Yes
Gold Open Access:No
In SCOPUS:Yes
In ISI Web of Science:Yes
Volume:128
DOI:10.1016/j.cose.2023.103165
Page Range:e103165
Publisher:Elsevier
ISSN:0167-4048
Status:Published
Keywords:Vulnerability Detection, Assembly Language, Machine Learning
HGF - Research field:Aeronautics, Space and Transport
HGF - Program:Space
HGF - Program Themes:Space System Technology
DLR - Research area:Raumfahrt
DLR - Program:R SY - Space System Technology
DLR - Research theme (Project):R - Intelligent analysis and methods for safe software development
Location: Jena
Institutes and Institutions:Institute of Data Science > Data Acquisition and Mobilisation
Institute of Data Science
Deposited By: Brust, Dr. Clemens-Alexander
Deposited On:14 Apr 2023 11:36
Last Modified:14 Apr 2023 11:36

Repository Staff Only: item control page

Browse
Search
Help & Contact
Information
electronic library is running on EPrints 3.3.12
Website and database design: Copyright © German Aerospace Center (DLR). All rights reserved.