Learning-based Automatic Parameter Tuning for Big Data Analytics Frameworks

by   Liang Bao, et al.

Big data analytics frameworks (BDAFs) have been widely used for data processing applications. These frameworks provide a large number of configuration parameters to users, which leads to a tuning issue that overwhelms users. To address this issue, many automatic tuning approaches have been proposed. However, it remains a critical challenge to generate enough samples in a high-dimensional parameter space within a time constraint. In this paper, we present AutoTune--an automatic parameter tuning system that aims to optimize application execution time on BDAFs. AutoTune first constructs a smaller-scale testbed from the production system so that it can generate more samples, and thus train a better prediction model, under a given time constraint. Furthermore, the AutoTune algorithm produces a set of samples that can provide a wide coverage over the high-dimensional parameter space, and searches for more promising configurations using the trained prediction model. AutoTune is implemented and evaluated using the Spark framework and HiBench benchmark deployed on a public cloud. Extensive experimental results illustrate that AutoTune improves on default configurations by 63.70 the five state-of-the-art tuning algorithms by 6


page 2

page 3

page 5

page 6

page 7

page 9

page 10

page 12


Towards Interactive, Adaptive and Result-aware Big Data Analytics

As data volumes grow across applications, analytics of large amounts of ...

OneStopTuner: An End to End Architecture for JVM Tuning of Spark Applications

Java is the backbone of widely used big data frameworks, such as Apache ...

Tuneful: An Online Significance-Aware Configuration Tuner for Big Data Analytics

Distributed analytics engines such as Spark are a common choice for proc...

Karasu: A Collaborative Approach to Efficient Cluster Configuration for Big Data Analytics

Selecting the right resources for big data analytics jobs is hard becaus...

Boosting Cloud Data Analytics using Multi-Objective Optimization

Data analytics in the cloud has become an integral part of enterprise bu...

On the Scalability of Big Data Cyber Security Analytics Systems

Big Data Cyber Security Analytics (BDCA) systems use big data technologi...

Benchmarking and Performance Modelling of MapReduce Communication Pattern

Understanding and predicting the performance of big data applications ru...

Please sign up or login with your details

Forgot password? Click here to reset