LoopStack: a Lightweight Tensor Algebra Compiler Stack

05/02/2022
by   Bram Wasti, et al.
6

We present LoopStack, a domain specific compiler stack for tensor operations, composed of a frontend, LoopTool, and an efficient optimizing code generator, LoopNest. This stack enables us to compile entire neural networks and generate code targeting the AVX2, AVX512, NEON, and NEONfp16 instruction sets while incorporating optimizations often missing from other machine learning compiler backends. We evaluate our stack on a collection of full neural networks and commonly used network blocks as well as individual operators, and show that LoopStack generates machine code that matches and frequently exceeds the performance of in state-of-the-art machine learning frameworks in both cases. We also show that for a large collection of schedules LoopNest's compilation is orders of magnitude faster than LLVM, while resulting in equal or improved run time performance. Additionally, LoopStack has a very small memory footprint - a binary size of 245KB, and under 30K lines of effective code makes it ideal for use on mobile and embedded devices.

READ FULL TEXT

page 7

page 8

page 11

page 12

research
07/11/2023

PowerFusion: A Tensor Compiler with Explicit Data Movement Description and Instruction-level Graph IR

Deep neural networks (DNNs) are of critical use in different domains. To...
research
11/08/2017

DLVM: A modern compiler infrastructure for deep learning systems

Deep learning software demands reliability and performance. However, man...
research
09/04/2023

LoopTune: Optimizing Tensor Computations with Reinforcement Learning

Advanced compiler technology is crucial for enabling machine learning ap...
research
08/03/2018

A Compiler-Compiler for DSL Embedding

In this paper, we present a framework to generate compilers for embedded...
research
12/11/2019

Array Languages Make Neural Networks Fast

Modern machine learning frameworks are complex: they are typically organ...
research
05/07/2020

TIRAMISU: A Polyhedral Compiler for Dense and Sparse Deep Learning

In this paper, we demonstrate a compiler that can optimize sparse and re...
research
08/08/2017

On-Stack Replacement à la Carte

On-stack replacement (OSR) dynamically transfers execution between diffe...

Please sign up or login with your details

Forgot password? Click here to reset