PowerFusion: A Tensor Compiler with Explicit Data Movement Description and Instruction-level Graph IR

07/11/2023
by   Zixuan Ma, et al.
0

Deep neural networks (DNNs) are of critical use in different domains. To accelerate DNN computation, tensor compilers are proposed to generate efficient code on different domain-specific accelerators. Existing tensor compilers mainly focus on optimizing computation efficiency. However, memory access is becoming a key performance bottleneck because the computational performance of accelerators is increasing much faster than memory performance. The lack of direct description of memory access and data dependence in current tensor compilers' intermediate representation (IR) brings significant challenges to generate memory-efficient code. In this paper, we propose IntelliGen, a tensor compiler that can generate high-performance code for memory-intensive operators by considering both computation and data movement optimizations. IntelliGen represent a DNN program using GIR, which includes primitives indicating its computation, data movement, and parallel strategies. This information will be further composed as an instruction-level dataflow graph to perform holistic optimizations by searching different memory access patterns and computation operations, and generating memory-efficient code on different hardware. We evaluate IntelliGen on NVIDIA GPU, AMD GPU, and Cambricon MLU, showing speedup up to 1.97x, 2.93x, and 16.91x(1.28x, 1.23x, and 2.31x on average), respectively, compared to current most performant frameworks.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/03/2023

oneDNN Graph Compiler: A Hybrid Approach for High-Performance Deep Learning Compilation

With the rapid development of deep learning models and hardware support ...
research
05/02/2018

Glow: Graph Lowering Compiler Techniques for Neural Networks

This paper presents the design of Glow, a machine learning compiler for ...
research
05/02/2022

LoopStack: a Lightweight Tensor Algebra Compiler Stack

We present LoopStack, a domain specific compiler stack for tensor operat...
research
10/29/2022

Enabling Data Movement and Computation Pipelining in Deep Learning Compiler

Pipelining between data loading and computation is a critical tensor pro...
research
10/12/2018

ISA Mapper: A Compute and Hardware Agnostic Deep Learning Compiler

Domain specific accelerators present new challenges and opportunities fo...
research
11/07/2019

MERIT: Tensor Transform for Memory-Efficient Vision Processing on Parallel Architectures

Computationally intensive deep neural networks (DNNs) are well-suited to...
research
01/15/2022

Moses: Efficient Exploitation of Cross-device Transferable Features for Tensor Program Optimization

Achieving efficient execution of machine learning models has attracted s...

Please sign up or login with your details

Forgot password? Click here to reset