Boosting Cloud Data Analytics using Multi-Objective Optimization

05/07/2020
by   Fei Song, et al.
0

Data analytics in the cloud has become an integral part of enterprise businesses. Big data analytics systems, however, still lack the ability to take user performance goals and budgetary constraints for a task, collectively referred to as task objectives, and automatically configure an analytic job to achieve these objectives. This paper presents a data analytics optimizer that can automatically determine a cluster configuration with a suitable number of cores as well as other system parameters that best meet the task objectives. At a core of our work is a principled multi-objective optimization (MOO) approach that computes a Pareto optimal set of job configurations to reveal tradeoffs between different user objectives, recommends a new job configuration that best explores such tradeoffs, and employs novel optimizations to enable such recommendations within a few seconds. We present efficient incremental algorithms based on the notion of a Progressive Frontier for realizing our MOO approach and implement them into a Spark-based prototype. Detailed experiments using benchmark workloads show that our MOO techniques provide a 2-50x speedup over existing MOO methods, while offering good coverage of the Pareto frontier. When compared to Ottertune, a state-of-the-art performance tuning system, our approach recommends configurations that yield 26%-49% reduction of running time of the TPCx-BB benchmark while adapting to different application preferences on multiple objectives.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/20/2021

Neural-based Modeling for Performance Tuning of Spark Data Analytics

Cloud data analytics has become an integral part of enterprise business ...
research
02/01/2018

Towards Reliable (and Efficient) Job Executions in a Practical Geo-distributed Data Analytics System

Geo-distributed data analytics are increasingly common to derive useful ...
research
08/22/2023

Karasu: A Collaborative Approach to Efficient Cluster Configuration for Big Data Analytics

Selecting the right resources for big data analytics jobs is hard becaus...
research
04/04/2023

Predicting the Performance-Cost Trade-off of Applications Across Multiple Systems

In modern computing environments, users may have multiple systems access...
research
11/15/2022

Perona: Robust Infrastructure Fingerprinting for Resource-Efficient Big Data Analytics

Choosing a good resource configuration for big data analytics applicatio...
research
06/04/2021

VEER: Disagreement-Free Multi-objective Configuration

Software comes with many configuration options, satisfying varying needs...
research
08/17/2018

Learning-based Automatic Parameter Tuning for Big Data Analytics Frameworks

Big data analytics frameworks (BDAFs) have been widely used for data pro...

Please sign up or login with your details

Forgot password? Click here to reset