On the Representation of Partially Specified Implementations and its Application to the Optimization of Linear Algebra Kernels on GPU

04/06/2019
by   Ulysse Beaugnon, et al.
0

Traditional optimizing compilers rely on rewrite rules to iteratively apply program transformations. This iterative approach hides optimization opportunities behind intermediate transformation steps. For instance, vectorization can only be applied to the innermost loop in a nest: one must first perform a loop interchange before even considering vectorization of an outer loop. In contrast, we propose an implementation framework representing programs as sets of possible implementation decisions. Specifying one decision can have an impact on others in a bidirectional manner: specifying that a loop must be vectorized prevents other loops from being nested inside it; conversely, specifying a loop as an outer loop will prevent it from being vectorized. These optimization decisions commute, obviating the pass ordering problem. We present a constraint programming system to formally define, represent and explore such implementation spaces. We also propose an exploration strategy combining tree search and branch-and-bound; the strength and novelty of this strategy reside in an analytical model of the lower bound on the execution time of a set of possible implementations. We showcase our approach on the construction and exploration of an implementation space for linear algebra kernels running on GPUs. We show this search space is expressive enough to represent complex decisions that fundamentally change the structure of the generated code. We also present preliminary results competitive with the performance of native GPU libraries.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/13/2020

Autotuning Search Space for Loop Transformations

One of the challenges for optimizing compilers is to predict whether app...
research
05/10/2021

Customized Monte Carlo Tree Search for LLVM/Polly's Composable Loop Optimization Transformations

Polly is the LLVM project's polyhedral loop nest optimizer. Recently, us...
research
01/23/2013

Representing and Combining Partially Specified CPTs

This paper extends previous work with network fragments and situation-sp...
research
11/24/2016

Automating the Last-Mile for High Performance Dense Linear Algebra

High performance dense linear algebra (DLA) libraries often rely on a ge...
research
06/12/2019

Loop Programming Practices that Simplify Quicksort Implementations

Quicksort algorithm with Hoare's partition scheme is traditionally imple...
research
03/04/2022

Machine Learning for CUDA+MPI Design Rules

We present a new strategy for automatically exploring the design space o...
research
09/09/2023

Compiling Recurrences over Dense and Sparse Arrays

Recurrence equations lie at the heart of many computational paradigms in...

Please sign up or login with your details

Forgot password? Click here to reset