Splitability Annotations: Optimizing Black-Box Function Composition in Existing Libraries

by   Shoumik Palkar, et al.

Data movement is a major bottleneck in parallel data-intensive applications. In response to this problem, researchers have proposed new runtimes and intermediate representations (IRs) that apply optimizations such as loop fusion under existing library APIs. Even though these runtimes generally do no require changes to user code, they require intrusive changes to the library itself: often, all the library functions need to be rewritten for a new IR or virtual machine. In this paper, we propose a new abstraction called splitability annotations (SAs) that enables key data movement optimizations on black-box library functions. SAs only require that users add an annotation for existing, unmodified functions and implement a small API to split data values in the library. Together, this interface describes how to partition values that are passed among functions to enable data pipelining and automatic parallelization while respecting each library's correctness constraints. We implement SAs in a system called Mozart. Without modifying any library function, on workloads using NumPy and Pandas in Python and Intel MKL in C, Mozart provides performance competitive with intrusive solutions that require rewriting libraries in many cases, can sometimes improve performance over past systems by up to 2x, and accelerates workloads by up to 30x.


page 1

page 2

page 3

page 4


Weld: Rethinking the Interface Between Data-Intensive Applications

Data analytics applications combine multiple functions from different li...

Investigating Black-Box Function Recognition Using Hardware Performance Counters

This paper presents new methods and results for learning information abo...

A Joint Python/C++ Library for Efficient yet Accessible Black-Box and Gray-Box Optimization with GOMEA

Exploiting knowledge about the structure of a problem can greatly benefi...

irbasis: Open-source database and software for intermediate-representation basis functions of imaginary-time Green's function

The open-source library, irbasis, provides easy-to-use tools for two set...

emcee v3: A Python ensemble sampling toolkit for affine-invariant MCMC

emcee is a Python library implementing a class of affine-invariant ensem...

Modeling Black-Box Components with Probabilistic Synthesis

This paper is concerned with synthesizing programs based on black-box or...

SLEEF: A Portable Vectorized Library of C Standard Mathematical Functions

In this paper, we present techniques used to implement our portable vect...

Please sign up or login with your details

Forgot password? Click here to reset