DeepAI AI Chat
Log In Sign Up

Tuneful: An Online Significance-Aware Configuration Tuner for Big Data Analytics

by   Ayat Fekry, et al.

Distributed analytics engines such as Spark are a common choice for processing extremely large datasets. However, finding good configurations for these systems remains challenging, with each workload potentially requiring a different setup to run optimally. Using suboptimal configurations incurs significant extra runtime costs. gaining traction within data-scientists communities where awareness of such issues is relatively low. We propose Tuneful, an approach that efficiently tunes the configuration of in-memory cluster computing systems. Tuneful combines incremental Sensitivity Analysis and Bayesian optimization to identify near-optimal configurations from a high-dimensional search space, using a small number of executions. This setup allows the tuning to be done online, without any previous training. Our experimental results show that Tuneful reduces the search time for finding close-to-optimal configurations by 62% (at the median) when compared to existing state-of-the-art techniques. This means that the amortization of the tuning cost happens significantly faster, enabling practical tuning for new classes of workloads.


Scout: An Experienced Guide to Find the Best Cloud Configuration

Finding the right cloud configuration for workloads is an essential step...

Ruya: Memory-Aware Iterative Optimization of Cluster Configurations for Big Data Processing

Selecting appropriate computational resources for data processing jobs o...

LOCAT: Low-Overhead Online Configuration Auto-Tuning of Spark SQL Applications

Spark SQL has been widely deployed in industry but it is challenging to ...

Learning-based Automatic Parameter Tuning for Big Data Analytics Frameworks

Big data analytics frameworks (BDAFs) have been widely used for data pro...

Fast and Low-cost Search for Efficient Cloud Configurations for HPC Workloads

The use of cloud computational resources has become increasingly importa...

Micky: A Cheaper Alternative for Selecting Cloud Instances

Most cloud computing optimizers explore and improve one workload at a ti...

Perona: Robust Infrastructure Fingerprinting for Resource-Efficient Big Data Analytics

Choosing a good resource configuration for big data analytics applicatio...