On the Potential of Execution Traces for Batch Processing Workload Optimization in Public Clouds

11/16/2021
by   Dominik Scheinert, et al.
0

With the growing amount of data, data processing workloads and the management of their resource usage becomes increasingly important. Since managing a dedicated infrastructure is in many situations infeasible or uneconomical, users progressively execute their respective workloads in the cloud. As the configuration of workloads and resources is often challenging, various methods have been proposed that either quickly profile towards a good configuration or determine one based on data from previous runs. Still, performance data to train such methods is often lacking and must be costly collected. In this paper, we propose a collaborative approach for sharing anonymized workload execution traces among users, mining them for general patterns, and exploiting clusters of historical workloads for future optimizations. We evaluate our prototype implementation for mining workload execution graphs on a publicly available trace dataset and demonstrate the predictive value of workload clusters determined through traces only.

READ FULL TEXT
research
05/23/2022

Scalable Infrastructure for Workload Characterization of Cluster Traces

In the recent past, characterizing workloads has been attempted to gain ...
research
12/15/2017

A Workload-Specific Memory Capacity Configuration Approach for In-Memory Data Analytic Platforms

We propose WSMC, a workload-specific memory capacity configuration appro...
research
07/28/2021

C3O: Collaborative Cluster Configuration Optimization for Distributed Data Processing in Public Clouds

Distributed dataflow systems enable data-parallel processing of large da...
research
03/19/2018

Cloud Workload Prediction based on Workflow Execution Time Discrepancies

Infrastructure as a service clouds hide the complexity of maintaining th...
research
11/08/2010

Use of Data Mining in Scheduler Optimization

The operating system's role in a computer system is to manage the variou...
research
03/29/2023

NoSQL Schema Design for Time-Dependent Workloads

In this paper, we propose a schema optimization method for time-dependen...
research
12/30/2017

A Loop-Based Methodology for Reducing Computational Redundancy in Workload Sets

The design of general purpose processors relies heavily on a workload ga...

Please sign up or login with your details

Forgot password? Click here to reset