Dias: Dynamic Rewriting of Pandas Code

03/28/2023
by   Stefanos Baziotis, et al.
0

In recent years, dataframe libraries, such as pandas have exploded in popularity. Due to their flexibility, they are increasingly used in ad-hoc exploratory data analysis (EDA) workloads. These workloads are diverse, including custom functions which can span libraries or be written in pure Python. The majority of systems available to accelerate EDA workloads focus on bulk-parallel workloads, which contain vastly different computational patterns, typically within a single library. As a result, they can introduce excessive overheads for ad-hoc EDA workloads due to their expensive optimization techniques. Instead, we identify program rewriting as a lightweight technique which can offer substantial speedups while also avoiding slowdowns. We implemented our techniques in Dias, which rewrites notebook cells to be more efficient for ad-hoc EDA workloads. We develop techniques for efficient rewrites in Dias, including dynamic checking of preconditions under which rewrites are correct and just-in-time rewrites for notebook environments. We show that Dias can rewrite individual cells to be 57× faster compared to pandas and 1909× faster compared to optimized systems such as modin. Furthermore, Dias can accelerate whole notebooks by up to 3.6× compared to pandas and 26.4× compared to modin.

READ FULL TEXT

page 8

page 16

research
04/19/2023

An Exploratory Study of Ad Hoc Parsers in Python

Background: Ad hoc parsers are pieces of code that use common string fun...
research
10/07/2022

MOS: A Mathematical Optimization Service

We introduce MOS, a software application designed to facilitate the depl...
research
11/01/2017

A study of research trends and issues in wireless ad hoc networks

Ad hoc network enables network creation on the fly without support of an...
research
04/07/2020

egg: Fast and Extensible E-graphs

An e-graph efficiently represents a congruence relation over many expres...
research
06/03/2019

On Modelling the Avoidability of Patterns as CSP

Solving avoidability problems in the area of string combinatorics often ...
research
10/19/2020

DBA bandits: Self-driving index tuning under ad-hoc, analytical workloads with safety guarantees

Automating physical database design has remained a long-term interest in...
research
10/11/2018

Increasing the Reusability of Enforcers with Lifecycle Events

Runtime enforcement can be effectively used to improve the reliability o...

Please sign up or login with your details

Forgot password? Click here to reset