Al-Sayeh, Hani und Memishi, Bunjamin und Paradies, Marcus und Sattler, Kai-Uwe (2020) Masha: Sampling-Based Performance Prediction of Big Data Applications in Resource-Constrained Clusters. In: The 1st Workshop on Distributed Infrastructure, Systems, Programming and AI (DISPA). Very Large Data Base Endowment Inc. (VLDB Endowment). The 1st Workshop on Distributed Infrastructure, Systems, Programming and AI (DISPA), 2020-08-31, Tokyo, Japan.
PDF
- Nur DLR-intern zugänglich
328kB |
Kurzfassung
Nowadays deployment of data-intensive systems in multi-dimensional domains is achieved with insufficient knowledge regarding the data, application internals, and infrastructure requirements. In addition, the current performance prediction frameworks focus to predict the performance of data-intensive applications on mid to large-scale infrastructures, which does not seem to be always the case. We reproduced 16 applications on a small-scale cluster, and obtained concerning results from a baseline prediction framework. Consequently, we argue that neither the previous design of the experiments, nor the prediction models are sufficiently accurate at resource-constrained cluster scenarios. Therefore, we propose Masha, a new, black-box, sampling-based approach, that is initially lead by a new design of experiments, without relying on any historical executions. This is followed by a new performance prediction model, whose main idea is that apart from the computation, the data also needs a first citizen role. Our preliminary results are promising, by means of being able to characterize complex applications, having an average prediction accuracy of 83%, and with a negligible overhead cost of only 2.42%. Being framework-independent, Masha is applicable to any data-intensive distributed system.
elib-URL des Eintrags: | https://elib.dlr.de/137362/ | ||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Dokumentart: | Konferenzbeitrag (Anderer) | ||||||||||||||||||||
Titel: | Masha: Sampling-Based Performance Prediction of Big Data Applications in Resource-Constrained Clusters | ||||||||||||||||||||
Autoren: |
| ||||||||||||||||||||
Datum: | August 2020 | ||||||||||||||||||||
Erschienen in: | The 1st Workshop on Distributed Infrastructure, Systems, Programming and AI (DISPA) | ||||||||||||||||||||
Referierte Publikation: | Ja | ||||||||||||||||||||
Open Access: | Nein | ||||||||||||||||||||
Gold Open Access: | Nein | ||||||||||||||||||||
In SCOPUS: | Nein | ||||||||||||||||||||
In ISI Web of Science: | Nein | ||||||||||||||||||||
Verlag: | Very Large Data Base Endowment Inc. (VLDB Endowment) | ||||||||||||||||||||
Status: | veröffentlicht | ||||||||||||||||||||
Stichwörter: | sampling, performance prediction, resource-constrained cluster, big data applications | ||||||||||||||||||||
Veranstaltungstitel: | The 1st Workshop on Distributed Infrastructure, Systems, Programming and AI (DISPA) | ||||||||||||||||||||
Veranstaltungsort: | Tokyo, Japan | ||||||||||||||||||||
Veranstaltungsart: | Workshop | ||||||||||||||||||||
Veranstaltungsdatum: | 31 August 2020 | ||||||||||||||||||||
Veranstalter : | VLDB Endowment | ||||||||||||||||||||
HGF - Forschungsbereich: | keine Zuordnung | ||||||||||||||||||||
HGF - Programm: | keine Zuordnung | ||||||||||||||||||||
HGF - Programmthema: | keine Zuordnung | ||||||||||||||||||||
DLR - Schwerpunkt: | keine Zuordnung | ||||||||||||||||||||
DLR - Forschungsgebiet: | keine Zuordnung | ||||||||||||||||||||
DLR - Teilgebiet (Projekt, Vorhaben): | keine Zuordnung, R - keine Zuordnung | ||||||||||||||||||||
Standort: | Jena | ||||||||||||||||||||
Institute & Einrichtungen: | Institut für Datenwissenschaften Institut für Datenwissenschaften > Datenmanagement und Analyse | ||||||||||||||||||||
Hinterlegt von: | Memishi, Dr. Bunjamin | ||||||||||||||||||||
Hinterlegt am: | 13 Nov 2020 14:21 | ||||||||||||||||||||
Letzte Änderung: | 24 Apr 2024 20:39 |
Nur für Mitarbeiter des Archivs: Kontrollseite des Eintrags