Predictive Price-Performance Optimization for Serverless Query Processing

12/16/2021
by   Rathijit Sen, et al.
0

We present an efficient, parametric modeling framework for predictive resource allocations, focusing on the amount of computational resources, that can optimize for a range of price-performance objectives for data analytics in serverless query processing settings. We discuss and evaluate in depth how our system, AutoExecutor, can use this framework to automatically select near-optimal executor and core counts for Spark SQL queries running on Azure Synapse. Our techniques improve upon Spark's in-built, reactive, dynamic executor allocation capabilities by substantially reducing the total executors allocated and executor occupancy while running queries, thereby freeing up executors that can potentially be used by other concurrent queries or in reducing the overall cluster provisioning needs. In contrast with post-execution analysis tools such as Sparklens, we predict resource allocations for queries before executing them and can also account for changes in input data sizes for predicting the desired allocations.

READ FULL TEXT
research
09/15/2017

Fast OLAP Query Execution in Main Memory on Large Data in a Cluster

Main memory column-stores have proven to be efficient for processing ana...
research
09/01/2018

Pay One, Get Hundreds for Free: Reducing Cloud Costs through Shared Query Execution

Cloud-based data analysis is nowadays common practice because of the low...
research
08/28/2017

Analyzing Query Performance and Attributing Blame for Contentions in a Cluster Computing Framework

Analyzing contention for resources in a cluster computing environment ac...
research
03/31/2018

A comparative analysis of state-of-the-art SQL-on-Hadoop systems for interactive analytics

Hadoop is emerging as the primary data hub in enterprises, and SQL repre...
research
02/01/2023

Revisiting Query Performance in GPU Database Systems

GPUs offer massive compute parallelism and high-bandwidth memory accesse...
research
03/28/2022

LOCAT: Low-Overhead Online Configuration Auto-Tuning of Spark SQL Applications

Spark SQL has been widely deployed in industry but it is challenging to ...
research
10/20/2021

Scheduling of Graph Queries: Controlling Intra- and Inter-query Parallelism for a High System Throughput

The vast amounts of data used in social, business or traffic networks, b...

Please sign up or login with your details

Forgot password? Click here to reset