Neural-based Modeling for Performance Tuning of Spark Data Analytics

01/20/2021
by   Khaled Zaouk, et al.
0

Cloud data analytics has become an integral part of enterprise business operations for data-driven insight discovery. Performance modeling of cloud data analytics is crucial for performance tuning and other critical operations in the cloud. Traditional modeling techniques fail to adapt to the high degree of diversity in workloads and system behaviors in this domain. In this paper, we bring recent Deep Learning techniques to bear on the process of automated performance modeling of cloud data analytics, with a focus on Spark data analytics as representative workloads. At the core of our work is the notion of learning workload embeddings (with a set of desired properties) to represent fundamental computational characteristics of different jobs, which enable performance prediction when used together with job configurations that control resource allocation and other system knobs. Our work provides an in-depth study of different modeling choices that suit our requirements. Results of extensive experiments reveal the strengths and limitations of different modeling methods, as well as superior performance of our best performing method over a state-of-the-art modeling tool for cloud analytics.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/07/2020

Boosting Cloud Data Analytics using Multi-Objective Optimization

Data analytics in the cloud has become an integral part of enterprise bu...
research
07/21/2022

Templating Shuffles

Cloud data centers are rapidly evolving. At the same time, large-scale d...
research
10/27/2020

In-situ data analytics for highly scalable cloud modelling on Cray machines

MONC is a highly scalable modelling tool for the investigation of atmosp...
research
07/25/2023

Smartpick: Workload Prediction for Serverless-enabled Scalable Data Analytics Systems

Many data analytic systems have adopted a newly emerging compute resourc...
research
04/30/2021

Vessel and Port Efficiency Metrics through Validated AIS data

Automatic Identification System (AIS) data represents a rich source of i...
research
07/17/2023

Modeling Data Analytics Architecture for Smart Cities Data-Driven Applications using DAT

Extracting valuable insights from vast amounts of information is a criti...
research
04/07/2023

Runtime Variation in Big Data Analytics

The dynamic nature of resource allocation and runtime conditions on Clou...

Please sign up or login with your details

Forgot password? Click here to reset