elib
DLR-Header
DLR-Logo -> http://www.dlr.de
DLR Portal Home | Imprint | Privacy Policy | Contact | Deutsch
Fontsize: [-] Text [+]

Masha: Sampling-Based Performance Prediction of Big Data Applications in Resource-Constrained Clusters

Al-Sayeh, Hani and Memishi, Bunjamin and Paradies, Marcus and Sattler, Kai-Uwe (2020) Masha: Sampling-Based Performance Prediction of Big Data Applications in Resource-Constrained Clusters. In: The 1st Workshop on Distributed Infrastructure, Systems, Programming and AI (DISPA). Very Large Data Base Endowment Inc. (VLDB Endowment). The 1st Workshop on Distributed Infrastructure, Systems, Programming and AI (DISPA), 31 August 2020, Tokyo, Japan.

[img] PDF - Only accessible within DLR
328kB

Abstract

Nowadays deployment of data-intensive systems in multi-dimensional domains is achieved with insufficient knowledge regarding the data, application internals, and infrastructure requirements. In addition, the current performance prediction frameworks focus to predict the performance of data-intensive applications on mid to large-scale infrastructures, which does not seem to be always the case. We reproduced 16 applications on a small-scale cluster, and obtained concerning results from a baseline prediction framework. Consequently, we argue that neither the previous design of the experiments, nor the prediction models are sufficiently accurate at resource-constrained cluster scenarios. Therefore, we propose Masha, a new, black-box, sampling-based approach, that is initially lead by a new design of experiments, without relying on any historical executions. This is followed by a new performance prediction model, whose main idea is that apart from the computation, the data also needs a first citizen role. Our preliminary results are promising, by means of being able to characterize complex applications, having an average prediction accuracy of 83%, and with a negligible overhead cost of only 2.42%. Being framework-independent, Masha is applicable to any data-intensive distributed system.

Item URL in elib:https://elib.dlr.de/137362/
Document Type:Conference or Workshop Item (Other)
Title:Masha: Sampling-Based Performance Prediction of Big Data Applications in Resource-Constrained Clusters
Authors:
AuthorsInstitution or Email of AuthorsAuthor's ORCID iDORCID Put Code
Al-Sayeh, HaniUNSPECIFIEDUNSPECIFIEDUNSPECIFIED
Memishi, BunjaminUNSPECIFIEDhttps://orcid.org/0000-0003-3557-3426UNSPECIFIED
Paradies, MarcusUNSPECIFIEDhttps://orcid.org/0000-0002-5743-6580UNSPECIFIED
Sattler, Kai-UweUNSPECIFIEDUNSPECIFIEDUNSPECIFIED
Date:August 2020
Journal or Publication Title:The 1st Workshop on Distributed Infrastructure, Systems, Programming and AI (DISPA)
Refereed publication:Yes
Open Access:No
Gold Open Access:No
In SCOPUS:No
In ISI Web of Science:No
Publisher:Very Large Data Base Endowment Inc. (VLDB Endowment)
Status:Published
Keywords:sampling, performance prediction, resource-constrained cluster, big data applications
Event Title:The 1st Workshop on Distributed Infrastructure, Systems, Programming and AI (DISPA)
Event Location:Tokyo, Japan
Event Type:Workshop
Event Dates:31 August 2020
Organizer:VLDB Endowment
HGF - Research field:other
HGF - Program:other
HGF - Program Themes:other
DLR - Research area:no assignment
DLR - Program:no assignment
DLR - Research theme (Project):no assignment, R - no assignment
Location: Jena
Institutes and Institutions:Institute of Data Science
Institute of Data Science > Datamangagement and Analysis
Deposited By: Memishi, Dr. Bunjamin
Deposited On:13 Nov 2020 14:21
Last Modified:13 Nov 2020 14:21

Repository Staff Only: item control page

Browse
Search
Help & Contact
Information
electronic library is running on EPrints 3.3.12
Website and database design: Copyright © German Aerospace Center (DLR). All rights reserved.