Lachesis: Automated Generation of Persistent Partitionings for UDF-Centric Analytics

06/30/2020
by   Jia Zou, et al.
0

Persistent partitioning is effective in avoiding expensive shuffling operations. However it remains a significant challenge to automate this process for Big Data analytics workloads that extensively use user defined functions (UDFs), where sub-computations are hard to be reused for partitionings compared to relational applications. In addition, functional dependency that is widely utilized for partitioning selection is often unavailable in the unstructured data that is ubiquitous in UDF-centric analytics. We propose the Lachesis system, which represents UDF-centric workloads as workflows of analyzable and reusable sub-computations. Lachesis further adopts a deep reinforcement learning model to infer which sub-computations should be used to partition the underlying data. This analysis is then applied to automatically optimize the storage of the data across applications to improve the performance and users' productivity.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/10/2021

Metall: A Persistent Memory Allocator For Data-Centric Analytics

Data analytics applications transform raw input data into analytics-spec...
research
04/02/2019

Learning a Partitioning Advisor with Deep Reinforcement Learning

Commercial data analytics products such as Microsoft Azure SQL Data Ware...
research
04/20/2018

Cut to Fit: Tailoring the Partitioning to the Computation

Social Graph Analytics applications are very often built using off-the-s...
research
07/21/2022

Templating Shuffles

Cloud data centers are rapidly evolving. At the same time, large-scale d...
research
11/09/2018

Patient-Centric Cellular Networks Optimization using Big Data Analytics

Big data analytics is one of the state-of-the-art tools to optimize netw...
research
09/04/2023

Towards Persistent Memory based Stateful Serverless Computing for Big Data Applications

The Function-as-a-service (FaaS) computing model has recently seen signi...
research
01/25/2021

Towards an Open Format for Scalable System Telemetry

A data representation for system behavior telemetry for scalable big dat...

Please sign up or login with your details

Forgot password? Click here to reset