Compiler-Level Matrix Multiplication Optimization for Deep Learning

09/23/2019
by   Huaqing Zhang, et al.
30

An important linear algebra routine, GEneral Matrix Multiplication (GEMM), is a fundamental operator in deep learning. Compilers need to translate these routines into low-level code optimized for specific hardware. Compiler-level optimization of GEMM has significant performance impact on training and executing deep learning models. However, most deep learning frameworks rely on hardware-specific operator libraries in which GEMM optimization has been mostly achieved by manual tuning, which restricts the performance on different target hardware. In this paper, we propose two novel algorithms for GEMM optimization based on the TVM framework, a lightweight Greedy Best First Search (G-BFS) method based on heuristic search, and a Neighborhood Actor Advantage Critic (N-A2C) method based on reinforcement learning. Experimental results show significant performance improvement of the proposed methods, in both the optimality of the solution and the cost of search in terms of time and fraction of the search space explored. Specifically, the proposed methods achieve 24 and 40 methods, respectively, while exploring only 0.1 proposed approaches have potential to be applied to other operator-level optimizations.

READ FULL TEXT

page 1

page 2

page 3

page 4

page 5

page 7

page 8

page 9

research
02/12/2018

TVM: An Automated End-to-End Optimizing Compiler for Deep Learning

There is an increasing need to bring machine learning to a wide diversit...
research
08/11/2020

Woodpecker-DL: Accelerating Deep Neural Networks via Hardware-Aware Multifaceted Optimizations

Accelerating deep model training and inference is crucial in practice. E...
research
11/02/2020

Cortex: A Compiler for Recursive Deep Learning Models

Optimizing deep learning models is generally performed in two steps: (i)...
research
12/22/2022

EuclidNets: An Alternative Operation for Efficient Inference of Deep Learning Models

With the advent of deep learning application on edge devices, researcher...
research
03/14/2023

Performance Embeddings: A Similarity-based Approach to Automatic Performance Optimization

Performance optimization is an increasingly challenging but often repeti...
research
04/12/2021

AI Powered Compiler Techniques for DL Code Optimization

Creating high performance implementations of deep learning primitives on...
research
09/04/2023

LoopTune: Optimizing Tensor Computations with Reinforcement Learning

Advanced compiler technology is crucial for enabling machine learning ap...

Please sign up or login with your details

Forgot password? Click here to reset