elib
DLR-Header
DLR-Logo -> http://www.dlr.de
DLR Portal Home | Impressum | Datenschutz | Kontakt | English
Schriftgröße: [-] Text [+]

Masha: Sampling-Based Performance Prediction of Big Data Applications in Resource-Constrained Clusters

Al-Sayeh, Hani und Memishi, Bunjamin und Paradies, Marcus und Sattler, Kai-Uwe (2020) Masha: Sampling-Based Performance Prediction of Big Data Applications in Resource-Constrained Clusters. In: The 1st Workshop on Distributed Infrastructure, Systems, Programming and AI (DISPA). Very Large Data Base Endowment Inc. (VLDB Endowment). The 1st Workshop on Distributed Infrastructure, Systems, Programming and AI (DISPA), 2020-08-31, Tokyo, Japan.

[img] PDF - Nur DLR-intern zugänglich
328kB

Kurzfassung

Nowadays deployment of data-intensive systems in multi-dimensional domains is achieved with insufficient knowledge regarding the data, application internals, and infrastructure requirements. In addition, the current performance prediction frameworks focus to predict the performance of data-intensive applications on mid to large-scale infrastructures, which does not seem to be always the case. We reproduced 16 applications on a small-scale cluster, and obtained concerning results from a baseline prediction framework. Consequently, we argue that neither the previous design of the experiments, nor the prediction models are sufficiently accurate at resource-constrained cluster scenarios. Therefore, we propose Masha, a new, black-box, sampling-based approach, that is initially lead by a new design of experiments, without relying on any historical executions. This is followed by a new performance prediction model, whose main idea is that apart from the computation, the data also needs a first citizen role. Our preliminary results are promising, by means of being able to characterize complex applications, having an average prediction accuracy of 83%, and with a negligible overhead cost of only 2.42%. Being framework-independent, Masha is applicable to any data-intensive distributed system.

elib-URL des Eintrags:https://elib.dlr.de/137362/
Dokumentart:Konferenzbeitrag (Anderer)
Titel:Masha: Sampling-Based Performance Prediction of Big Data Applications in Resource-Constrained Clusters
Autoren:
AutorenInstitution oder E-Mail-AdresseAutoren-ORCID-iDORCID Put Code
Al-Sayeh, Hanihani-bassam.al-sayeh (at) tu-ilmenau.deNICHT SPEZIFIZIERTNICHT SPEZIFIZIERT
Memishi, BunjaminBunjamin.Memishi (at) dlr.dehttps://orcid.org/0000-0003-3557-3426NICHT SPEZIFIZIERT
Paradies, MarcusMarcus.Paradies (at) dlr.dehttps://orcid.org/0000-0002-5743-6580NICHT SPEZIFIZIERT
Sattler, Kai-Uwekus (at) tu-ilmenau.deNICHT SPEZIFIZIERTNICHT SPEZIFIZIERT
Datum:August 2020
Erschienen in:The 1st Workshop on Distributed Infrastructure, Systems, Programming and AI (DISPA)
Referierte Publikation:Ja
Open Access:Nein
Gold Open Access:Nein
In SCOPUS:Nein
In ISI Web of Science:Nein
Verlag:Very Large Data Base Endowment Inc. (VLDB Endowment)
Status:veröffentlicht
Stichwörter:sampling, performance prediction, resource-constrained cluster, big data applications
Veranstaltungstitel:The 1st Workshop on Distributed Infrastructure, Systems, Programming and AI (DISPA)
Veranstaltungsort:Tokyo, Japan
Veranstaltungsart:Workshop
Veranstaltungsdatum:31 August 2020
Veranstalter :VLDB Endowment
HGF - Forschungsbereich:keine Zuordnung
HGF - Programm:keine Zuordnung
HGF - Programmthema:keine Zuordnung
DLR - Schwerpunkt:keine Zuordnung
DLR - Forschungsgebiet:keine Zuordnung
DLR - Teilgebiet (Projekt, Vorhaben):keine Zuordnung, R - keine Zuordnung
Standort: Jena
Institute & Einrichtungen:Institut für Datenwissenschaften
Institut für Datenwissenschaften > Datenmanagement und Analyse
Hinterlegt von: Memishi, Dr. Bunjamin
Hinterlegt am:13 Nov 2020 14:21
Letzte Änderung:24 Apr 2024 20:39

Nur für Mitarbeiter des Archivs: Kontrollseite des Eintrags

Blättern
Suchen
Hilfe & Kontakt
Informationen
electronic library verwendet EPrints 3.3.12
Gestaltung Webseite und Datenbank: Copyright © Deutsches Zentrum für Luft- und Raumfahrt (DLR). Alle Rechte vorbehalten.