TVM: End-to-End Optimization Stack for Deep Learning

02/12/2018
by   Tianqi Chen, et al.
0

Scalable frameworks, such as TensorFlow, MXNet, Caffe, and PyTorch drive the current popularity and utility of deep learning. However, these frameworks are optimized for a narrow range of server-class GPUs and deploying workloads to other platforms such as mobile phones, embedded devices, and specialized accelerators (e.g., FPGAs, ASICs) requires laborious manual effort. We propose TVM, an end-to-end optimization stack that exposes graph-level and operator-level optimizations to provide performance portability to deep learning workloads across diverse hardware back-ends. We discuss the optimization challenges specific to deep learning that TVM solves: high-level operator fusion, low-level memory reuse across threads, mapping to arbitrary hardware primitives, and memory latency hiding. Experimental results demonstrate that TVM delivers performance across hardware back-ends that are competitive with state-of-the-art libraries for low-power CPU and server-class GPUs. We also demonstrate TVM's ability to target new hardware accelerator back-ends by targeting an FPGA-based generic deep learning accelerator. The compiler infrastructure is open sourced.

READ FULL TEXT

page 2

page 16

research
02/12/2018

TVM: An Automated End-to-End Optimizing Compiler for Deep Learning

There is an increasing need to bring machine learning to a wide diversit...
research
07/11/2018

VTA: An Open Hardware-Software Stack for Deep Learning

Hardware acceleration is an enabler for ubiquitous and efficient deep le...
research
01/24/2018

Intel nGraph: An Intermediate Representation, Compiler, and Executor for Deep Learning

The Deep Learning (DL) community sees many novel topologies published ea...
research
05/22/2019

NTP : A Neural Network Topology Profiler

Performance of end-to-end neural networks on a given hardware platform i...
research
05/19/2022

SOL: Reducing the Maintenance Overhead for Integrating Hardware Support into AI Frameworks

The increased interest in Artificial Intelligence (AI) raised the need f...
research
12/06/2022

Integration of a systolic array based hardware accelerator into a DNN operator auto-tuning framework

The deployment of neural networks on heterogeneous SoCs coupled with cus...
research
02/20/2019

DNNVM : End-to-End Compiler Leveraging Heterogeneous Optimizations on FPGA-based CNN Accelerators

The convolutional neural network (CNN) has become a state-of-the-art met...

Please sign up or login with your details

Forgot password? Click here to reset