DeepAI AI Chat
Log In Sign Up

TVM: An Automated End-to-End Optimizing Compiler for Deep Learning

02/12/2018
by   Tianqi Chen, et al.
0

There is an increasing need to bring machine learning to a wide diversity of hardware devices. Current frameworks rely on vendor-specific operator libraries and optimize for a narrow range of server-class GPUs. Deploying workloads to new platforms such as mobile phones, embedded devices, and accelerators (e.g., FPGAs, ASICs) requires significant manual effort. We propose TVM, a compiler that exposes graph-level and operator-level optimizations to provide performance portability to deep learning workloads across diverse hardware back-ends. TVM solves optimization challenges specific to deep learning such as high-level operator fusion, mapping to arbitrary hardware primitives, and memory latency hiding. TVM also offers automated optimization of low-level programs to hardware characteristics by employing a novel learning-based cost modeling method for rapid exploration of code optimizations. Experimental results demonstrate that TVM delivers performance across hardware back-ends that are competitive with state-of-the-art hand-tuned libraries for low-power CPU, mobile GPU, and server-class GPUs. We also demonstrate TVM's ability to target new accelerator back-ends by targeting an FPGA-based generic deep learning accelerator. The system is open sourced and in production use inside several major companies.

READ FULL TEXT
02/12/2018

TVM: End-to-End Optimization Stack for Deep Learning

Scalable frameworks, such as TensorFlow, MXNet, Caffe, and PyTorch drive...
05/21/2018

Learning to Optimize Tensor Programs

We introduce a learning-based framework to optimize tensor programs for ...
11/02/2020

Cortex: A Compiler for Recursive Deep Learning Models

Optimizing deep learning models is generally performed in two steps: (i)...
09/23/2019

Compiler-Level Matrix Multiplication Optimization for Deep Learning

An important linear algebra routine, GEneral Matrix Multiplication (GEMM...
04/12/2021

AI Powered Compiler Techniques for DL Code Optimization

Creating high performance implementations of deep learning primitives on...
08/11/2020

Woodpecker-DL: Accelerating Deep Neural Networks via Hardware-Aware Multifaceted Optimizations

Accelerating deep model training and inference is crucial in practice. E...
07/09/2022

TensorIR: An Abstraction for Automatic Tensorized Program Optimization

Deploying deep learning models on various devices has become an importan...