The Workflow Trace Archive: Open-Access Data from Public and Private Computing Infrastructures -- Technical Report

06/18/2019
by   Laurens Versluis, et al.
0

Realistic, relevant, and reproducible experiments often need input traces collected from real-world environments. We focus in this work on traces of workflows---common in datacenters, clouds, and HPC infrastructures. We show that the state-of-the-art in using workflow-traces raises important issues: (1) the use of realistic traces is infrequent, and (2) the use of realistic, open-access traces even more so. Alleviating these issues, we introduce the Workflow Trace Archive (WTA), an open-access archive of workflow traces from diverse computing infrastructures and tooling to parse, validate, and analyze traces. The WTA includes >48 million workflows captured from >10 computing infrastructures, representing a broad diversity of trace domains and characteristics. To emphasize the importance of trace diversity, we characterize the WTA contents and analyze in simulation the impact of trace diversity on experiment results. Our results indicate significant differences in characteristics, properties, and workflow structures between workload sources, domains, and fields.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/01/2020

WorkflowHub: Community Framework for Enabling Scientific Workflow Research and Development – Technical Report

Scientific workflows are a cornerstone of modern scientific computing. T...
research
05/20/2019

Measuring the Complexity of Packet Traces

This paper studies the structure of several real-world traces (including...
research
06/28/2020

Spatiotemporal Modeling of Seismic Images for Acoustic Impedance Estimation

Seismic inversion refers to the process of estimating reservoir rock pro...
research
07/25/2018

Validation and Inference of Schema-Level Workflow Data-Dependency Annotations

An advantage of scientific workflow systems is their ability to collect ...
research
07/03/2021

TrafPy: Benchmarking Data Centre Network Systems

Benchmarking is commonly used in research fields such as computer archit...
research
03/16/2021

Generation of Realistic Cloud Access Times for Mobile Application Testing using Transfer Learning

The network Quality of Service (QoS) metrics such as the access time, th...
research
09/14/2017

TraceTracker: Hardware/Software Co-Evaluation for Large-Scale I/O Workload Reconstruction

Block traces are widely used for system studies, model verifications, an...

Please sign up or login with your details

Forgot password? Click here to reset