PolyScientist: Automatic Loop Transformations Combined with Microkernels for Optimization of Deep Learning Primitives

02/06/2020
by   Sanket Tavarageri, et al.
0

At the heart of deep learning training and inferencing are computationally intensive primitives such as convolutions which form the building blocks of deep neural networks. Researchers have taken two distinct approaches to creating high performance implementations of deep learning kernels, namely, 1) library development exemplified by Intel MKL-DNN for CPUs, 2) automatic compilation represented by the TensorFlow XLA compiler. The two approaches have their drawbacks: even though a custom built library can deliver very good performance, the cost and time of development of the library can be high. Automatic compilation of kernels is attractive but in practice, till date, automatically generated implementations lag expert coded kernels in performance by orders of magnitude. In this paper, we develop a hybrid solution to the development of deep learning kernels that achieves the best of both worlds: the expert coded microkernels are utilized for the innermost loops of kernels and we use the advanced polyhedral technology to automatically tune the outer loops for performance. We design a novel polyhedral model based data reuse algorithm to optimize the outer loops of the kernel. Through experimental evaluation on an important class of deep learning primitives namely convolutions, we demonstrate that the approach we develop attains the same levels of performance as Intel MKL-DNN, a hand coded deep learning library.

READ FULL TEXT

page 8

page 9

page 10

research
09/23/2020

Applying the Roofline model for Deep Learning performance optimizations

In this paper We present a methodology for creating Roofline models auto...
research
06/02/2020

PolyDL: Polyhedral Optimizations for Creation of High Performance DL primitives

Deep Neural Networks (DNNs) have revolutionized many aspects of our live...
research
01/03/2023

oneDNN Graph Compiler: A Hybrid Approach for High-Performance Deep Learning Compilation

With the rapid development of deep learning models and hardware support ...
research
06/15/2019

High-Performance Deep Learning via a Single Building Block

Deep learning (DL) is one of the most prominent branches of machine lear...
research
04/05/2016

dMath: A Scalable Linear Algebra and Math Library for Heterogeneous GP-GPU Architectures

A new scalable parallel math library, dMath, is presented in this paper ...
research
09/30/2019

MIOpen: An Open Source Library For Deep Learning Primitives

Deep Learning has established itself to be a common occurrence in the bu...
research
01/03/2023

Autovesk: Automatic vectorization of unstructured static kernels by graph transformations

Leveraging the SIMD capability of modern CPU architectures is mandatory ...

Please sign up or login with your details

Forgot password? Click here to reset