Towards an Achievable Performance for the Loop Nests

02/02/2019
by   Aniket Shivam, et al.
0

Numerous code optimization techniques, including loop nest optimizations, have been developed over the last four decades. Loop optimization techniques transform loop nests to improve the performance of the code on a target architecture, including exposing parallelism. Finding and evaluating an optimal, semantic-preserving sequence of transformations is a complex problem. The sequence is guided using heuristics and/or analytical models and there is no way of knowing how close it gets to optimal performance or if there is any headroom for improvement. This paper makes two contributions. First, it uses a comparative analysis of loop optimizations/transformations across multiple compilers to determine how much headroom may exist for each compiler. And second, it presents an approach to characterize the loop nests based on their hardware performance counter values and a Machine Learning approach that predicts which compiler will generate the fastest code for a loop nest. The prediction is made for both auto-vectorized, serial compilation and for auto-parallelization. The results show that the headroom for state-of-the-art compilers ranges from 1.10x to 1.42x for the serial code and from 1.30x to 1.71x for the auto-parallelized code. These results are based on the Machine Learning predictions.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/29/2019

MCompiler: A Synergistic Compilation Framework

This paper presents a meta-compilation framework, the MCompiler. The mai...
research
02/24/2021

Learning to Make Compiler Optimizations More Effective

Because loops execute their body many times, compiler developers place m...
research
01/24/2021

Analytical Characterization and Design Space Exploration for Optimization of CNNs

Moving data through the memory hierarchy is a fundamental bottleneck tha...
research
11/14/2018

A Performance Vocabulary for Affine Loop Transformations

Modern polyhedral compilers excel at aggressively optimizing codes with ...
research
03/28/2018

An Approach for Finding Permutations Quickly: Fusion and Dimension matching

Polyhedral compilers can perform complex loop optimizations that improve...
research
05/09/2023

Learning to Parallelize with OpenMP by Augmented Heterogeneous AST Representation

Detecting parallelizable code regions is a challenging task, even for ex...
research
06/08/2022

Progress Report: A Deep Learning Guided Exploration of Affine Unimodular Loop Transformations

In this paper, we present a work in progress about a deep learning based...

Please sign up or login with your details

Forgot password? Click here to reset