elib
DLR-Header
DLR-Logo -> http://www.dlr.de
DLR Portal Home | Imprint | Privacy Policy | Contact | Deutsch
Fontsize: [-] Text [+]

Traceability and Reproducibility of Big Data Analytics Workflows using Provenance

Schreiber, Andreas (2017) Traceability and Reproducibility of Big Data Analytics Workflows using Provenance. 2nd European GeoInformation Symposium and Exposition, 20.-22. Jun. 2017, Berlin, Deutschland.

[img] PDF
4MB

Official URL: https://www.afcea.org/event/?q=GEO17

Abstract

The provenance of data provides detailed information about the origin of that data. That includes information about ownership and both actions and modifications performed on the data. With provenance information, data will be traceable and reproducible. In data science, results that are not reproducible by peer scientists are valueless and of no significance. In engineering, users can be more confident in the quality of products that ware developed based on simulations and data analytics workflows. To specify and store provenance information, W3C has standardized the provenance model PROV. Using PROV and associated implementations, users can record provenance of data analytics processes. The provenance information are directed acyclic graphs that can be analyzed to get insight into the data analytics processes. The talk describes the architecture of provenance management and how to apply provenance recording and provenance analytics to data science and big data analytics workflows.

Item URL in elib:https://elib.dlr.de/113545/
Document Type:Conference or Workshop Item (Speech)
Title:Traceability and Reproducibility of Big Data Analytics Workflows using Provenance
Authors:
AuthorsInstitution or Email of AuthorsAuthors ORCID iD
Schreiber, AndreasAndreas.Schreiber (at) dlr.dehttps://orcid.org/0000-0001-5750-5649
Date:21 June 2017
Refereed publication:No
Open Access:Yes
Gold Open Access:No
In SCOPUS:No
In ISI Web of Science:No
Status:Published
Keywords:big data, data science, reproducibility, traceability, provenance
Event Title:2nd European GeoInformation Symposium and Exposition
Event Location:Berlin, Deutschland
Event Type:international Conference
Event Dates:20.-22. Jun. 2017
Organizer:AFCEA International
HGF - Research field:Aeronautics, Space and Transport
HGF - Program:Space
HGF - Program Themes:Space Technology
DLR - Research area:Raumfahrt
DLR - Program:R SY - Technik für Raumfahrtsysteme
DLR - Research theme (Project):R - Vorhaben SISTEC
Location: Köln-Porz
Institutes and Institutions:Institut of Simulation and Software Technology
Institut of Simulation and Software Technology > Distributed Systems and Component Software
Deposited By: Schreiber, Andreas
Deposited On:07 Nov 2017 13:55
Last Modified:31 Jul 2019 20:11

Repository Staff Only: item control page

Browse
Search
Help & Contact
Information
electronic library is running on EPrints 3.3.12
Copyright © 2008-2017 German Aerospace Center (DLR). All rights reserved.