Al-Sayeh, Hani and Memishi, Bunjamin and Paradies, Marcus and Sattler, Kai-Uwe (2020) Masha: Sampling-Based Performance Prediction of Big Data Applications in Resource-Constrained Clusters. In: The 1st Workshop on Distributed Infrastructure, Systems, Programming and AI (DISPA). Very Large Data Base Endowment Inc. (VLDB Endowment). The 1st Workshop on Distributed Infrastructure, Systems, Programming and AI (DISPA), 31 August 2020, Tokyo, Japan.
![]() |
PDF
- Only accessible within DLR
328kB |
Abstract
Nowadays deployment of data-intensive systems in multi-dimensional domains is achieved with insufficient knowledge regarding the data, application internals, and infrastructure requirements. In addition, the current performance prediction frameworks focus to predict the performance of data-intensive applications on mid to large-scale infrastructures, which does not seem to be always the case. We reproduced 16 applications on a small-scale cluster, and obtained concerning results from a baseline prediction framework. Consequently, we argue that neither the previous design of the experiments, nor the prediction models are sufficiently accurate at resource-constrained cluster scenarios. Therefore, we propose Masha, a new, black-box, sampling-based approach, that is initially lead by a new design of experiments, without relying on any historical executions. This is followed by a new performance prediction model, whose main idea is that apart from the computation, the data also needs a first citizen role. Our preliminary results are promising, by means of being able to characterize complex applications, having an average prediction accuracy of 83%, and with a negligible overhead cost of only 2.42%. Being framework-independent, Masha is applicable to any data-intensive distributed system.
Item URL in elib: | https://elib.dlr.de/137362/ | ||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Document Type: | Conference or Workshop Item (Other) | ||||||||||||||||||||
Title: | Masha: Sampling-Based Performance Prediction of Big Data Applications in Resource-Constrained Clusters | ||||||||||||||||||||
Authors: |
| ||||||||||||||||||||
Date: | August 2020 | ||||||||||||||||||||
Journal or Publication Title: | The 1st Workshop on Distributed Infrastructure, Systems, Programming and AI (DISPA) | ||||||||||||||||||||
Refereed publication: | Yes | ||||||||||||||||||||
Open Access: | No | ||||||||||||||||||||
Gold Open Access: | No | ||||||||||||||||||||
In SCOPUS: | No | ||||||||||||||||||||
In ISI Web of Science: | No | ||||||||||||||||||||
Publisher: | Very Large Data Base Endowment Inc. (VLDB Endowment) | ||||||||||||||||||||
Status: | Published | ||||||||||||||||||||
Keywords: | sampling, performance prediction, resource-constrained cluster, big data applications | ||||||||||||||||||||
Event Title: | The 1st Workshop on Distributed Infrastructure, Systems, Programming and AI (DISPA) | ||||||||||||||||||||
Event Location: | Tokyo, Japan | ||||||||||||||||||||
Event Type: | Workshop | ||||||||||||||||||||
Event Dates: | 31 August 2020 | ||||||||||||||||||||
Organizer: | VLDB Endowment | ||||||||||||||||||||
HGF - Research field: | other | ||||||||||||||||||||
HGF - Program: | other | ||||||||||||||||||||
HGF - Program Themes: | other | ||||||||||||||||||||
DLR - Research area: | no assignment | ||||||||||||||||||||
DLR - Program: | no assignment | ||||||||||||||||||||
DLR - Research theme (Project): | no assignment, R - no assignment | ||||||||||||||||||||
Location: | Jena | ||||||||||||||||||||
Institutes and Institutions: | Institute of Data Science Institute of Data Science > Datamangagement and Analysis | ||||||||||||||||||||
Deposited By: | Memishi, Dr. Bunjamin | ||||||||||||||||||||
Deposited On: | 13 Nov 2020 14:21 | ||||||||||||||||||||
Last Modified: | 13 Nov 2020 14:21 |
Repository Staff Only: item control page