VTA: An Open Hardware-Software Stack for Deep Learning

07/11/2018
by   Thierry Moreau, et al.
6

Hardware acceleration is an enabler for ubiquitous and efficient deep learning. With hardware accelerators being introduced in datacenter and edge devices, it is time to acknowledge that hardware specialization is central to the deep learning system stack. This technical report presents the Versatile Tensor Accelerator (VTA), an open, generic, and customizable deep learning accelerator design. VTA is a programmable accelerator that exposes a RISC-like programming abstraction to describe operations at the tensor level. We designed VTA to expose the most salient and common characteristics of mainstream deep learning accelerators, such as tensor operations, DMA load/stores, and explicit compute/memory arbitration. VTA is more than a standalone accelerator design: it's an end-to-end solution that includes drivers, a JIT runtime, and an optimizing compiler stack based on TVM. The current release of VTA includes a behavioral hardware simulator, as well as the infrastructure to deploy VTA on low-cost FPGA development boards for fast prototyping. By extending the TVM stack with a customizable, and open source deep learning hardware accelerator design, we are exposing a transparent end-to-end deep learning stack from the high-level deep learning framework, down to the actual hardware design and implementation. This forms a truly end-to-end, from software-to-hardware open source stack for deep learning systems.

READ FULL TEXT

page 6

page 9

research
02/12/2018

TVM: End-to-End Optimization Stack for Deep Learning

Scalable frameworks, such as TensorFlow, MXNet, Caffe, and PyTorch drive...
research
05/26/2021

A Full-Stack Search Technique for Domain Optimized Deep Learning Accelerators

The rapidly-changing deep learning landscape presents a unique opportuni...
research
09/15/2021

Union: A Unified HW-SW Co-Design Ecosystem in MLIR for Evaluating Tensor Operations on Spatial Accelerators

To meet the extreme compute demands for deep learning across commercial ...
research
12/10/2019

SMAUG: End-to-End Full-Stack Simulation Infrastructure for Deep Learning Workloads

In recent years, there has been tremendous advances in hardware accelera...
research
04/20/2020

Agile Autotuning of a Transprecision Tensor Accelerator Overlay for TVM Compiler Stack

Specialized accelerators for tensor-operations, such as blocked-matrix o...
research
05/03/2021

Bring Your Own Codegen to Deep Learning Compiler

Deep neural networks (DNNs) have been ubiquitously applied in many appli...
research
05/13/2021

Combining Emulation and Simulation to Evaluate a Near Memory Key/Value Lookup Accelerator

Processing large numbers of key/value lookups is an integral part of mod...

Please sign up or login with your details

Forgot password? Click here to reset