Karasu: A Collaborative Approach to Efficient Cluster Configuration for Big Data Analytics

08/22/2023
by   Dominik Scheinert, et al.
0

Selecting the right resources for big data analytics jobs is hard because of the wide variety of configuration options like machine type and cluster size. As poor choices can have a significant impact on resource efficiency, cost, and energy usage, automated approaches are gaining popularity. Most existing methods rely on profiling recurring workloads to find near-optimal solutions over time. Due to the cold-start problem, this often leads to lengthy and costly profiling phases. However, big data analytics jobs across users can share many common properties: they often operate on similar infrastructure, using similar algorithms implemented in similar frameworks. The potential in sharing aggregated profiling runs to collaboratively address the cold start problem is largely unexplored. We present Karasu, an approach to more efficient resource configuration profiling that promotes data sharing among users working with similar infrastructures, frameworks, algorithms, or datasets. Karasu trains lightweight performance models using aggregated runtime information of collaborators and combines them into an ensemble method to exploit inherent knowledge of the configuration search space. Moreover, Karasu allows the optimization of multiple objectives simultaneously. Our evaluation is based on performance data from diverse workload executions in a public cloud environment. We show that Karasu is able to significantly boost existing methods in terms of performance, search time, and cost, even when few comparable profiling runs are available that share only partial common characteristics with the target job.

READ FULL TEXT

page 1

page 7

page 8

research
11/15/2022

Perona: Robust Infrastructure Fingerprinting for Resource-Efficient Big Data Analytics

Choosing a good resource configuration for big data analytics applicatio...
research
01/22/2020

Tuneful: An Online Significance-Aware Configuration Tuner for Big Data Analytics

Distributed analytics engines such as Spark are a common choice for proc...
research
11/08/2022

Ruya: Memory-Aware Iterative Optimization of Cluster Configurations for Big Data Processing

Selecting appropriate computational resources for data processing jobs o...
research
05/07/2020

Boosting Cloud Data Analytics using Multi-Objective Optimization

Data analytics in the cloud has become an integral part of enterprise bu...
research
10/17/2019

ConEx: Efficient Exploration of Big-Data System Configurations for Better Performance

Configuration space complexity makes the big-data software systems hard ...
research
08/17/2018

Learning-based Automatic Parameter Tuning for Big Data Analytics Frameworks

Big data analytics frameworks (BDAFs) have been widely used for data pro...
research
07/05/2022

Blink: Lightweight Sample Runs for Cost Optimization of Big Data Applications

Distributed in-memory data processing engines accelerate iterative appli...

Please sign up or login with your details

Forgot password? Click here to reset