Stäbler, Maximilian und Lange, Markus und Kipper, Samir und Langdon, Christoph und Köster, Frank (2025) GC-DAM: Graph and Contextual Embeddings for Heterogeneous Data Asset Matching. CEUR-WS. Extended Semantic Web Conference (ESWC 2025), 2025-06-01 - 2025-06-06, Portorož, Slovenia. ISSN 1613-0073.
![]() |
PDF
5MB |
Offizielle URL: https://ceur-ws.org/Vol-4007/
Kurzfassung
Data assets—such as datasets, data services, APIs, algorithms, and analytical models—are valuable digital resources that organizations use to create value, support decision-making, and optimize business processes. Matching and integrating these assets, despite differences in semantic languages, ontologies, or schemas, is essential for building scalable and interoperable dataspaces. However, existing approaches often focus solely on semantic similarities, overlooking structurally similar assets from other domains that could be highly relevant. To address this gap, we present Graph and Contextual Embeddings for Heterogeneous Data Asset Matching (GC-DAM). GC-DAM employs two embedding strategies to match data assets based on both semantic and structural attributes. Structural (morphological) features are automatically incorporated into a knowledge graph, enabling the identification of assets that are structurally similar to a query but may originate from different domains, while metadata descriptions capture the semantic (contextual) features. This dual approach overcomes the limitations of methods that rely solely on semantic descriptions. We validate our approach against a custom dataset of 10,000 Kaggle data assets. Our multimodal embedding achieves 77% agreement on our custom dataset, demonstrating its ability to identify structurally similar assets across diverse domains, even when they are semantically different. The dataset and code are publicly available to the research community.
elib-URL des Eintrags: | https://elib.dlr.de/215766/ | ||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Dokumentart: | Konferenzbeitrag (Vortrag) | ||||||||||||||||||||||||
Titel: | GC-DAM: Graph and Contextual Embeddings for Heterogeneous Data Asset Matching | ||||||||||||||||||||||||
Autoren: |
| ||||||||||||||||||||||||
Datum: | 7 August 2025 | ||||||||||||||||||||||||
Referierte Publikation: | Ja | ||||||||||||||||||||||||
Open Access: | Ja | ||||||||||||||||||||||||
Gold Open Access: | Nein | ||||||||||||||||||||||||
In SCOPUS: | Nein | ||||||||||||||||||||||||
In ISI Web of Science: | Nein | ||||||||||||||||||||||||
Verlag: | CEUR-WS | ||||||||||||||||||||||||
ISSN: | 1613-0073 | ||||||||||||||||||||||||
Status: | veröffentlicht | ||||||||||||||||||||||||
Stichwörter: | Multi-Modal-Embedding, Heterogeneous dataspaces, Knowledge Graphs, Automated Interoperability, LLM | ||||||||||||||||||||||||
Veranstaltungstitel: | Extended Semantic Web Conference (ESWC 2025) | ||||||||||||||||||||||||
Veranstaltungsort: | Portorož, Slovenia | ||||||||||||||||||||||||
Veranstaltungsart: | internationale Konferenz | ||||||||||||||||||||||||
Veranstaltungsbeginn: | 1 Juni 2025 | ||||||||||||||||||||||||
Veranstaltungsende: | 6 Juni 2025 | ||||||||||||||||||||||||
HGF - Forschungsbereich: | Luftfahrt, Raumfahrt und Verkehr | ||||||||||||||||||||||||
HGF - Programm: | Verkehr | ||||||||||||||||||||||||
HGF - Programmthema: | Verkehrssystem | ||||||||||||||||||||||||
DLR - Schwerpunkt: | Verkehr | ||||||||||||||||||||||||
DLR - Forschungsgebiet: | V VS - Verkehrssystem | ||||||||||||||||||||||||
DLR - Teilgebiet (Projekt, Vorhaben): | V - DiVe - Digital organisiertes Verkehrssystem | ||||||||||||||||||||||||
Standort: | Ulm | ||||||||||||||||||||||||
Institute & Einrichtungen: | Institut für KI-Sicherheit | ||||||||||||||||||||||||
Hinterlegt von: | Stäbler, Maximilian | ||||||||||||||||||||||||
Hinterlegt am: | 19 Aug 2025 13:34 | ||||||||||||||||||||||||
Letzte Änderung: | 19 Aug 2025 13:34 |
Nur für Mitarbeiter des Archivs: Kontrollseite des Eintrags