AI Powered Compiler Techniques for DL Code Optimization

04/12/2021
by   Sanket Tavarageri, et al.
0

Creating high performance implementations of deep learning primitives on CPUs is a challenging task. Multiple considerations including multi-level cache hierarchy, and wide SIMD units of CPU platforms influence the choice of program transformations to apply for performance optimization. In this paper, we present machine learning powered compiler techniques to optimize loop nests. We take a two-pronged approach to code optimization: We first apply high level optimizations to optimize the code to take optimal advantage of the cache memories. Then, we perform low level, target-specific optimizations to effectively vectorize the code to run well on the SIMD units of the machine. For high level optimizations, we use polyhedral compilation techniques and deep learning approaches. For low level optimization, we use a target specific code generator that generates code using vector intrinsics and Reinforcement Learning (RL) techniques to find the optimal parameters for the code generator. We perform experimental evaluation of the developed techniques on various matrix multiplications that occur in popular deep learning workloads. The experimental results show that the compiler techniques presented in the paper achieve 7.6X and 8.2X speed-ups over a baseline for sequential and parallel runs respectively.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/12/2018

TVM: An Automated End-to-End Optimizing Compiler for Deep Learning

There is an increasing need to bring machine learning to a wide diversit...
research
11/08/2017

Correctness of Speculative Optimizations with Dynamic Deoptimization

High-performance dynamic language implementations make heavy use of spec...
research
05/12/2021

Breaking the Computation and Communication Abstraction Barrier in Distributed Machine Learning Workloads

Recent trend towards increasing large machine learning models require bo...
research
07/29/2019

Proposition d'un modèle pour l'optimisation automatique de boucles dans le compilateur Tiramisu : cas d'optimisation de déroulage

Computer architectures become more and more complex. It requires more ef...
research
05/22/2018

Compiling with Continuations and LLVM

LLVM is an infrastructure for code generation and low-level optimization...
research
08/19/2020

Compiling ONNX Neural Network Models Using MLIR

Deep neural network models are becoming increasingly popular and have be...
research
09/23/2019

Compiler-Level Matrix Multiplication Optimization for Deep Learning

An important linear algebra routine, GEneral Matrix Multiplication (GEMM...

Please sign up or login with your details

Forgot password? Click here to reset