elib
DLR-Header
DLR-Logo -> http://www.dlr.de
DLR Portal Home | Imprint | Privacy Policy | Contact | Deutsch
Fontsize: [-] Text [+]

DAPHNE: An Open and Extensible System Infrastructure for Integrated Data Analysis Pipelines

Damme, Patrick and Birkenbach, Marius and Bitsakos, Constantinos and Boehm, Matthias and Bonnet, Philippe and Ciorba, Florina M. and Dokter, Mark and Dowgiallo, Pawel and Eleliemy, Ahmed and Faerber, Christian and Goumas, Georgios I. and Habich, Dirk and Hedam, Niclas and Hofer, Marlies and Huang, Wenjun and Innerebner, Kevin and Karakostas, Vasileios and Kern, Roman and Kosar, Tomaz and Krause, Alexander and Krems, Daniel and Laber, Andreas and Lehner, Wolfgang and Mier, Eric and Paradies, Marcus and Peischl, Bernhard and Poerwawinata, Gabrielle and Psomadakis, Stratos and Rabl, Tilmann and Ratuszniak, Piotr and Silva, Pedro and Skuppin, Nikolai and Starzacher, Andreas and Steinwender, Benjamin and Tolovski, Ilin and Tözün, Pinar and Ulatowski, Wojciech and Wang, Yuanyuan and Wrosz, Izajasz P. and Zamuda, Ales and Zhang, Ce and Zhu, Xiaoxiang (2022) DAPHNE: An Open and Extensible System Infrastructure for Integrated Data Analysis Pipelines. In: 12th Annual Conference on Innovative Data Systems Research, CIDR 2022. CIDR 2022, 2022-01-09 - 2022-01-12, Chaminade, US.

Full text not available from this repository.

Abstract

Integrated data analysis (IDA) pipelines—that combine data management (DM) and query processing, high-performance computing (HPC), and machine learning (ML) training and scoring—become increasingly common in practice. Interestingly, systems of these areas share many compilation and runtime techniques, and the used—increasingly heterogeneous—hardware infrastructure converges as well. Yet, the programming paradigms, cluster resource management, data formats and representations, as well as execution strategies differ substantially. DAPHNE is an open and extensible system infrastructure for such IDA pipelines, including language abstractions, compilation and runtime techniques, multi-level scheduling, hardware (HW) accelerators, and computational storage for increasing productivity and eliminating unnecessary overheads. In this paper, we make a case for IDA pipelines, describe the overall DAPHNE system architecture, its key components, and the design of a vectorized execution engine for computational storage, HW accelerators, as well as local and distributed operations. Preliminary experiments that compare DAPHNE with MonetDB, Pandas, DuckDB, and TensorFlow show promising results.

Item URL in elib:https://elib.dlr.de/189734/
Document Type:Conference or Workshop Item (Speech)
Title:DAPHNE: An Open and Extensible System Infrastructure for Integrated Data Analysis Pipelines
Authors:
AuthorsInstitution or Email of AuthorsAuthor's ORCID iDORCID Put Code
Damme, PatrickUNSPECIFIEDUNSPECIFIEDUNSPECIFIED
Birkenbach, MariusUNSPECIFIEDUNSPECIFIEDUNSPECIFIED
Bitsakos, ConstantinosUNSPECIFIEDUNSPECIFIEDUNSPECIFIED
Boehm, MatthiasUNSPECIFIEDUNSPECIFIEDUNSPECIFIED
Bonnet, PhilippeUNSPECIFIEDUNSPECIFIEDUNSPECIFIED
Ciorba, Florina M.UNSPECIFIEDUNSPECIFIEDUNSPECIFIED
Dokter, MarkUNSPECIFIEDUNSPECIFIEDUNSPECIFIED
Dowgiallo, PawelUNSPECIFIEDUNSPECIFIEDUNSPECIFIED
Eleliemy, AhmedUNSPECIFIEDUNSPECIFIEDUNSPECIFIED
Faerber, ChristianUNSPECIFIEDUNSPECIFIEDUNSPECIFIED
Goumas, Georgios I.UNSPECIFIEDUNSPECIFIEDUNSPECIFIED
Habich, DirkUNSPECIFIEDUNSPECIFIEDUNSPECIFIED
Hedam, NiclasUNSPECIFIEDUNSPECIFIEDUNSPECIFIED
Hofer, MarliesUNSPECIFIEDUNSPECIFIEDUNSPECIFIED
Huang, WenjunUNSPECIFIEDUNSPECIFIEDUNSPECIFIED
Innerebner, KevinUNSPECIFIEDUNSPECIFIEDUNSPECIFIED
Karakostas, VasileiosUNSPECIFIEDUNSPECIFIEDUNSPECIFIED
Kern, RomanUNSPECIFIEDUNSPECIFIEDUNSPECIFIED
Kosar, TomazUNSPECIFIEDUNSPECIFIEDUNSPECIFIED
Krause, AlexanderUNSPECIFIEDUNSPECIFIEDUNSPECIFIED
Krems, DanielUNSPECIFIEDUNSPECIFIEDUNSPECIFIED
Laber, AndreasUNSPECIFIEDUNSPECIFIEDUNSPECIFIED
Lehner, WolfgangUNSPECIFIEDUNSPECIFIEDUNSPECIFIED
Mier, EricUNSPECIFIEDUNSPECIFIEDUNSPECIFIED
Paradies, MarcusUNSPECIFIEDhttps://orcid.org/0000-0002-5743-6580UNSPECIFIED
Peischl, BernhardUNSPECIFIEDUNSPECIFIEDUNSPECIFIED
Poerwawinata, GabrielleUNSPECIFIEDUNSPECIFIEDUNSPECIFIED
Psomadakis, StratosUNSPECIFIEDUNSPECIFIEDUNSPECIFIED
Rabl, TilmannUNSPECIFIEDUNSPECIFIEDUNSPECIFIED
Ratuszniak, PiotrUNSPECIFIEDUNSPECIFIEDUNSPECIFIED
Silva, PedroUNSPECIFIEDUNSPECIFIEDUNSPECIFIED
Skuppin, NikolaiUNSPECIFIEDUNSPECIFIEDUNSPECIFIED
Starzacher, AndreasUNSPECIFIEDUNSPECIFIEDUNSPECIFIED
Steinwender, BenjaminUNSPECIFIEDUNSPECIFIEDUNSPECIFIED
Tolovski, IlinUNSPECIFIEDUNSPECIFIEDUNSPECIFIED
Tözün, PinarUNSPECIFIEDUNSPECIFIEDUNSPECIFIED
Ulatowski, WojciechUNSPECIFIEDUNSPECIFIEDUNSPECIFIED
Wang, YuanyuanUNSPECIFIEDUNSPECIFIEDUNSPECIFIED
Wrosz, Izajasz P.UNSPECIFIEDUNSPECIFIEDUNSPECIFIED
Zamuda, AlesUNSPECIFIEDUNSPECIFIEDUNSPECIFIED
Zhang, CeUNSPECIFIEDUNSPECIFIEDUNSPECIFIED
Zhu, XiaoxiangUNSPECIFIEDUNSPECIFIEDUNSPECIFIED
Date:2022
Journal or Publication Title:12th Annual Conference on Innovative Data Systems Research, CIDR 2022
Refereed publication:Yes
Open Access:No
Gold Open Access:No
In SCOPUS:Yes
In ISI Web of Science:No
Status:Published
Keywords:big data
Event Title:CIDR 2022
Event Location:Chaminade, US
Event Type:international Conference
Event Start Date:9 January 2022
Event End Date:12 January 2022
HGF - Research field:Aeronautics, Space and Transport
HGF - Program:Space
HGF - Program Themes:Earth Observation
DLR - Research area:Raumfahrt
DLR - Program:R EO - Earth Observation
DLR - Research theme (Project):R - Project Big Data
Location: Jena
Institutes and Institutions:Institute of Data Science > Data Management and Enrichment
Deposited By: Paradies, Dr.-Ing. Marcus
Deposited On:17 Nov 2022 15:37
Last Modified:12 Jul 2024 09:12

Repository Staff Only: item control page

Browse
Search
Help & Contact
Information
electronic library is running on EPrints 3.3.12
Website and database design: Copyright © German Aerospace Center (DLR). All rights reserved.