Tuneful: An Online Significance-Aware Configuration Tuner for Big Data Analytics

01/22/2020
by   Ayat Fekry, et al.
0

Distributed analytics engines such as Spark are a common choice for processing extremely large datasets. However, finding good configurations for these systems remains challenging, with each workload potentially requiring a different setup to run optimally. Using suboptimal configurations incurs significant extra runtime costs. gaining traction within data-scientists communities where awareness of such issues is relatively low. We propose Tuneful, an approach that efficiently tunes the configuration of in-memory cluster computing systems. Tuneful combines incremental Sensitivity Analysis and Bayesian optimization to identify near-optimal configurations from a high-dimensional search space, using a small number of executions. This setup allows the tuning to be done online, without any previous training. Our experimental results show that Tuneful reduces the search time for finding close-to-optimal configurations by 62% (at the median) when compared to existing state-of-the-art techniques. This means that the amortization of the tuning cost happens significantly faster, enabling practical tuning for new classes of workloads.

READ FULL TEXT
research
08/22/2023

Karasu: A Collaborative Approach to Efficient Cluster Configuration for Big Data Analytics

Selecting the right resources for big data analytics jobs is hard becaus...
research
09/21/2023

ContTune: Continuous Tuning by Conservative Bayesian Optimization for Distributed Stream Data Processing Systems

The past decade has seen rapid growth of distributed stream data process...
research
03/28/2022

LOCAT: Low-Overhead Online Configuration Auto-Tuning of Spark SQL Applications

Spark SQL has been widely deployed in industry but it is challenging to ...
research
04/04/2023

Predicting the Performance-Cost Trade-off of Applications Across Multiple Systems

In modern computing environments, users may have multiple systems access...
research
08/17/2018

Learning-based Automatic Parameter Tuning for Big Data Analytics Frameworks

Big data analytics frameworks (BDAFs) have been widely used for data pro...
research
06/28/2020

Fast and Low-cost Search for Efficient Cloud Configurations for HPC Workloads

The use of cloud computational resources has become increasingly importa...
research
03/15/2018

Micky: A Cheaper Alternative for Selecting Cloud Instances

Most cloud computing optimizers explore and improve one workload at a ti...

Please sign up or login with your details

Forgot password? Click here to reset