ROMEO: A Binary Vulnerability Detection Dataset for Exploring Juliet through the Lens of Assembly Language

Brust, Clemens-Alexander und Sonnekalb, Tim und Gruner, Bernd (2023) ROMEO: A Binary Vulnerability Detection Dataset for Exploring Juliet through the Lens of Assembly Language. Computers and Security, 128, e103165. Elsevier. doi: 10.1016/j.cose.2023.103165. ISSN 0167-4048.

PDF - Preprintversion (eingereichte Entwurfsversion)
214kB

Offizielle URL: https://www.sciencedirect.com/science/article/pii/S0167404823000755

Kurzfassung

Context Automatic vulnerability detection on C/C++ source code has benefitted from the introduction of machine learning to the field, with many recent publications targeting this combination. In contrast, assembly language or machine code artifacts receive less attention, although there are compelling reasons to study them. They are more representative of what is executed, more easily incorporated in dynamic analysis, and in the case of closed-source code, there is no alternative.

Objective We evaluate the representative capability of assembly language compared to C/C++ source code for vulnerability detection. Furthermore, we investigate the role of call graph context in detecting function-spanning vulnerabilities. Finally, we verify whether compiling a benchmark dataset compromises an experiment's soundness by inadvertently leaking label information.

Method We propose ROMEO, a publicly available, reproducible and reusable binary vulnerability detection benchmark dataset derived from the synthetic Juliet test suite. Alongside, we introduce a simple text-based assembly language representation that includes context for function-spanning vulnerability detection and semantics to detect high-level vulnerabilities. It is constructed by disassembling the .text segment of the respective binaries.

Results We evaluate an x86 assembly language representation of the compiled dataset, combined with an off-the-shelf classifier. It compares favorably to state-of-the-art methods, including those operating on the full C/C++ code. Including context information using the call graph improves detection of function-spanning vulnerabilities. There is no label information leaked during the compilation process.

Conclusion Performing vulnerability detection on a compiled program instead of the source code is a worthwhile tradeoff. While certain information is lost, e.g., comments and certain identifiers, other valuable information is gained, e.g., about compiler optimizations.

elib-URL des Eintrags:

https://elib.dlr.de/194605/

Dokumentart:

Zeitschriftenbeitrag

Titel:

ROMEO: A Binary Vulnerability Detection Dataset for Exploring Juliet through the Lens of Assembly Language

Autoren:

Autoren	Institution oder E-Mail-Adresse	Autoren-ORCID-iD	ORCID Put Code
Brust, Clemens-Alexander	clemens-alexander.brust (at) dlr.de	https://orcid.org/0000-0001-5419-1998	147856943
Sonnekalb, Tim	Tim.Sonnekalb (at) dlr.de	https://orcid.org/0000-0002-0067-1790	NICHT SPEZIFIZIERT
Gruner, Bernd	Bernd.Gruner (at) dlr.de	https://orcid.org/0000-0002-4177-2993	147856947

Datum:

7 März 2023

Erschienen in:

Computers and Security

Referierte Publikation:

Open Access:

Gold Open Access:

Nein

In SCOPUS:

In ISI Web of Science:

Band:

128

DOI:

10.1016/j.cose.2023.103165

Seitenbereich:

e103165

Verlag:

Elsevier

ISSN:

0167-4048

Status:

veröffentlicht

Stichwörter:

Vulnerability Detection, Assembly Language, Machine Learning

HGF - Forschungsbereich:

Luftfahrt, Raumfahrt und Verkehr

HGF - Programm:

Raumfahrt

HGF - Programmthema:

Technik für Raumfahrtsysteme

DLR - Schwerpunkt:

Raumfahrt

DLR - Forschungsgebiet:

R SY - Technik für Raumfahrtsysteme

DLR - Teilgebiet (Projekt, Vorhaben):

R - Intelligente Analysen und Methoden zur sicheren Softwareentwicklung

Standort:

Jena

Institute & Einrichtungen:

Institut für Datenwissenschaften > Datengewinnung und -mobilisierung
Institut für Datenwissenschaften

Hinterlegt von:

Brust, Dr. Clemens-Alexander

Hinterlegt am:

14 Apr 2023 11:36

Letzte Änderung:

01 Dez 2023 09:02

Nur für Mitarbeiter des Archivs: Kontrollseite des Eintrags