elib
DLR-Header
DLR-Logo -> http://www.dlr.de
DLR Portal Home | Impressum | Datenschutz | Kontakt | English
Schriftgröße: [-] Text [+]

ROMEO: A Binary Vulnerability Detection Dataset for Exploring Juliet through the Lens of Assembly Language

Brust, Clemens-Alexander und Sonnekalb, Tim und Gruner, Bernd (2023) ROMEO: A Binary Vulnerability Detection Dataset for Exploring Juliet through the Lens of Assembly Language. Computers and Security, 128, e103165. Elsevier. doi: 10.1016/j.cose.2023.103165. ISSN 0167-4048.

[img] PDF - Preprintversion (eingereichte Entwurfsversion)
214kB

Offizielle URL: https://www.sciencedirect.com/science/article/pii/S0167404823000755

Kurzfassung

Context Automatic vulnerability detection on C/C++ source code has benefitted from the introduction of machine learning to the field, with many recent publications targeting this combination. In contrast, assembly language or machine code artifacts receive less attention, although there are compelling reasons to study them. They are more representative of what is executed, more easily incorporated in dynamic analysis, and in the case of closed-source code, there is no alternative. Objective We evaluate the representative capability of assembly language compared to C/C++ source code for vulnerability detection. Furthermore, we investigate the role of call graph context in detecting function-spanning vulnerabilities. Finally, we verify whether compiling a benchmark dataset compromises an experiment's soundness by inadvertently leaking label information. Method We propose ROMEO, a publicly available, reproducible and reusable binary vulnerability detection benchmark dataset derived from the synthetic Juliet test suite. Alongside, we introduce a simple text-based assembly language representation that includes context for function-spanning vulnerability detection and semantics to detect high-level vulnerabilities. It is constructed by disassembling the .text segment of the respective binaries. Results We evaluate an x86 assembly language representation of the compiled dataset, combined with an off-the-shelf classifier. It compares favorably to state-of-the-art methods, including those operating on the full C/C++ code. Including context information using the call graph improves detection of function-spanning vulnerabilities. There is no label information leaked during the compilation process. Conclusion Performing vulnerability detection on a compiled program instead of the source code is a worthwhile tradeoff. While certain information is lost, e.g., comments and certain identifiers, other valuable information is gained, e.g., about compiler optimizations.

elib-URL des Eintrags:https://elib.dlr.de/194605/
Dokumentart:Zeitschriftenbeitrag
Titel:ROMEO: A Binary Vulnerability Detection Dataset for Exploring Juliet through the Lens of Assembly Language
Autoren:
AutorenInstitution oder E-Mail-AdresseAutoren-ORCID-iDORCID Put Code
Brust, Clemens-Alexanderclemens-alexander.brust (at) dlr.dehttps://orcid.org/0000-0001-5419-1998147856943
Sonnekalb, TimTim.Sonnekalb (at) dlr.dehttps://orcid.org/0000-0002-0067-1790NICHT SPEZIFIZIERT
Gruner, BerndBernd.Gruner (at) dlr.dehttps://orcid.org/0000-0002-4177-2993147856947
Datum:7 März 2023
Erschienen in:Computers and Security
Referierte Publikation:Ja
Open Access:Ja
Gold Open Access:Nein
In SCOPUS:Ja
In ISI Web of Science:Ja
Band:128
DOI:10.1016/j.cose.2023.103165
Seitenbereich:e103165
Verlag:Elsevier
ISSN:0167-4048
Status:veröffentlicht
Stichwörter:Vulnerability Detection, Assembly Language, Machine Learning
HGF - Forschungsbereich:Luftfahrt, Raumfahrt und Verkehr
HGF - Programm:Raumfahrt
HGF - Programmthema:Technik für Raumfahrtsysteme
DLR - Schwerpunkt:Raumfahrt
DLR - Forschungsgebiet:R SY - Technik für Raumfahrtsysteme
DLR - Teilgebiet (Projekt, Vorhaben):R - Intelligente Analysen und Methoden zur sicheren Softwareentwicklung
Standort: Jena
Institute & Einrichtungen:Institut für Datenwissenschaften > Datengewinnung und -mobilisierung
Institut für Datenwissenschaften
Hinterlegt von: Brust, Dr. Clemens-Alexander
Hinterlegt am:14 Apr 2023 11:36
Letzte Änderung:01 Dez 2023 09:02

Nur für Mitarbeiter des Archivs: Kontrollseite des Eintrags

Blättern
Suchen
Hilfe & Kontakt
Informationen
electronic library verwendet EPrints 3.3.12
Gestaltung Webseite und Datenbank: Copyright © Deutsches Zentrum für Luft- und Raumfahrt (DLR). Alle Rechte vorbehalten.