Masha: Sampling-Based Performance Prediction of Big Data Applications in Resource-Constrained Clusters

Al-Sayeh, Hani und Memishi, Bunjamin und Paradies, Marcus und Sattler, Kai-Uwe (2020) Masha: Sampling-Based Performance Prediction of Big Data Applications in Resource-Constrained Clusters. In: The 1st Workshop on Distributed Infrastructure, Systems, Programming and AI (DISPA). Very Large Data Base Endowment Inc. (VLDB Endowment). The 1st Workshop on Distributed Infrastructure, Systems, Programming and AI (DISPA), 2020-08-31, Tokyo, Japan.

PDF - Nur DLR-intern zugänglich
328kB

Kurzfassung

Nowadays deployment of data-intensive systems in multi-dimensional domains is achieved with insufficient knowledge regarding the data, application internals, and infrastructure requirements. In addition, the current performance prediction frameworks focus to predict the performance of data-intensive applications on mid to large-scale infrastructures, which does not seem to be always the case. We reproduced 16 applications on a small-scale cluster, and obtained concerning results from a baseline prediction framework. Consequently, we argue that neither the previous design of the experiments, nor the prediction models are sufficiently accurate at resource-constrained cluster scenarios. Therefore, we propose Masha, a new, black-box, sampling-based approach, that is initially lead by a new design of experiments, without relying on any historical executions. This is followed by a new performance prediction model, whose main idea is that apart from the computation, the data also needs a first citizen role. Our preliminary results are promising, by means of being able to characterize complex applications, having an average prediction accuracy of 83%, and with a negligible overhead cost of only 2.42%. Being framework-independent, Masha is applicable to any data-intensive distributed system.

elib-URL des Eintrags:

https://elib.dlr.de/137362/

Dokumentart:

Konferenzbeitrag (Anderer)

Titel:

Masha: Sampling-Based Performance Prediction of Big Data Applications in Resource-Constrained Clusters

Autoren:

Autoren	Institution oder E-Mail-Adresse	Autoren-ORCID-iD	ORCID Put Code
Al-Sayeh, Hani	hani-bassam.al-sayeh (at) tu-ilmenau.de	NICHT SPEZIFIZIERT	NICHT SPEZIFIZIERT
Memishi, Bunjamin	Bunjamin.Memishi (at) dlr.de	https://orcid.org/0000-0003-3557-3426	NICHT SPEZIFIZIERT
Paradies, Marcus	Marcus.Paradies (at) dlr.de	https://orcid.org/0000-0002-5743-6580	NICHT SPEZIFIZIERT
Sattler, Kai-Uwe	kus (at) tu-ilmenau.de	NICHT SPEZIFIZIERT	NICHT SPEZIFIZIERT

Datum:

August 2020

Erschienen in:

The 1st Workshop on Distributed Infrastructure, Systems, Programming and AI (DISPA)

Referierte Publikation:

Open Access:

Nein

Gold Open Access:

Nein

In SCOPUS:

Nein

In ISI Web of Science:

Nein

Verlag:

Very Large Data Base Endowment Inc. (VLDB Endowment)

Status:

veröffentlicht

Stichwörter:

sampling, performance prediction, resource-constrained cluster, big data applications

Veranstaltungstitel:

The 1st Workshop on Distributed Infrastructure, Systems, Programming and AI (DISPA)

Veranstaltungsort:

Tokyo, Japan

Veranstaltungsart:

Workshop

Veranstaltungsdatum:

31 August 2020

Veranstalter :

VLDB Endowment

HGF - Forschungsbereich:

keine Zuordnung

HGF - Programm:

keine Zuordnung

HGF - Programmthema:

keine Zuordnung

DLR - Schwerpunkt:

keine Zuordnung

DLR - Forschungsgebiet:

keine Zuordnung

DLR - Teilgebiet (Projekt, Vorhaben):

keine Zuordnung, R - keine Zuordnung

Standort:

Jena

Institute & Einrichtungen:

Institut für Datenwissenschaften
Institut für Datenwissenschaften > Datenmanagement und Analyse

Hinterlegt von:

Memishi, Dr. Bunjamin

Hinterlegt am:

13 Nov 2020 14:21

Letzte Änderung:

24 Apr 2024 20:39

Nur für Mitarbeiter des Archivs: Kontrollseite des Eintrags