elib
DLR-Header
DLR-Logo -> http://www.dlr.de
DLR Portal Home | Imprint | Privacy Policy | Contact | Deutsch
Fontsize: [-] Text [+]

Juggler: Autonomous Cost Optimization and Performance Prediction of Big Data Applications

Al-Sayeh, Hani and Memishi, Bunjamin and Jibril, Muhammad Attahir and Paradies, Marcus and Sattler, Kai-Uwe (2022) Juggler: Autonomous Cost Optimization and Performance Prediction of Big Data Applications. In: 2022 ACM SIGMOD International Conference on the Management of Data, SIGMOD 2022. SIGMOD 2022, 2022-06-12 - 2022-06-17, Philadelphia, US. doi: 10.1145/3514221.3517892. ISBN 978-145039249-5. ISSN 0730-8078.

[img] PDF
2MB

Abstract

Distributed in-memory processing frameworks accelerate iterative workloads by caching suitable datasets in memory rather than recomputing them in each iteration. Selecting appropriate datasets to cache as well as allocating a suitable cluster configuration for caching these datasets play a crucial role in achieving optimal performance. In practice, both are tedious, time-consuming tasks and are often neglected by end users, who are typically not aware of workload semantics, sizes of intermediate data, and cluster specification. To address these problems, we present Juggler, an end-to-end framework, which autonomously selects appropriate datasets for caching and recommends a correspondingly suitable cluster configuration to end users, with the aim of achieving optimal execution time and cost. We evaluate Juggler on various iterative, real-world, machine learning applications. Compared with our baseline, Juggler reduces execution time to 25.1% and cost to 58.1%, on average, as a result of selecting suitable datasets for caching. It recommends optimal cluster configuration in 50% of cases and near-to-optimal configuration in the remaining cases. Moreover, Juggler achieves an average performance prediction accuracy of 90%.

Item URL in elib:https://elib.dlr.de/189731/
Document Type:Conference or Workshop Item (Speech)
Title:Juggler: Autonomous Cost Optimization and Performance Prediction of Big Data Applications
Authors:
AuthorsInstitution or Email of AuthorsAuthor's ORCID iDORCID Put Code
Al-Sayeh, HaniUNSPECIFIEDUNSPECIFIEDUNSPECIFIED
Memishi, BunjaminUNSPECIFIEDhttps://orcid.org/0000-0003-3557-3426UNSPECIFIED
Jibril, Muhammad AttahirUNSPECIFIEDUNSPECIFIEDUNSPECIFIED
Paradies, MarcusUNSPECIFIEDhttps://orcid.org/0000-0002-5743-6580UNSPECIFIED
Sattler, Kai-UweUNSPECIFIEDUNSPECIFIEDUNSPECIFIED
Date:2022
Journal or Publication Title:2022 ACM SIGMOD International Conference on the Management of Data, SIGMOD 2022
Refereed publication:Yes
Open Access:Yes
Gold Open Access:No
In SCOPUS:Yes
In ISI Web of Science:Yes
DOI:10.1145/3514221.3517892
ISSN:0730-8078
ISBN:978-145039249-5
Status:Published
Keywords:performance prediction, cost optimization, apache spark, big data
Event Title:SIGMOD 2022
Event Location:Philadelphia, US
Event Type:international Conference
Event Start Date:12 June 2022
Event End Date:17 June 2022
Organizer:ACM
HGF - Research field:Aeronautics, Space and Transport
HGF - Program:Space
HGF - Program Themes:Earth Observation
DLR - Research area:Raumfahrt
DLR - Program:R EO - Earth Observation
DLR - Research theme (Project):R - Project Big Data
Location: Jena
Institutes and Institutions:Institute of Data Science > Data Management and Enrichment
Deposited By: Paradies, Dr.-Ing. Marcus
Deposited On:17 Nov 2022 15:35
Last Modified:24 Apr 2024 20:50

Repository Staff Only: item control page

Browse
Search
Help & Contact
Information
electronic library is running on EPrints 3.3.12
Website and database design: Copyright © German Aerospace Center (DLR). All rights reserved.