DeepAI
Log In Sign Up

Agile Autotuning of a Transprecision Tensor Accelerator Overlay for TVM Compiler Stack

04/20/2020
by   Dionysios Diamantopoulos, et al.
5

Specialized accelerators for tensor-operations, such as blocked-matrix operations and multi-dimensional convolutions, have been emerged as powerful architecture choices for high-performance Deep-Learning computing. The rapid development of frameworks, models, and precision options challenges the adaptability of such tensor-accelerators since the adaptation to new requirements incurs significant engineering costs. Programmable tensor accelerators offer a promising alternative by allowing reconfiguration of a virtual architecture that overlays on top of the physical FPGA configurable fabric. We propose an overlay (τ-VTA) and an optimization method guided by agile-inspired auto-tuning techniques. We achieve higher performance and faster convergence than state-of-art.

READ FULL TEXT

page 1

page 2

page 3

07/11/2018

VTA: An Open Hardware-Software Stack for Deep Learning

Hardware acceleration is an enabler for ubiquitous and efficient deep le...
05/03/2021

Bring Your Own Codegen to Deep Learning Compiler

Deep neural networks (DNNs) have been ubiquitously applied in many appli...
09/15/2021

Union: A Unified HW-SW Co-Design Ecosystem in MLIR for Evaluating Tensor Operations on Spatial Accelerators

To meet the extreme compute demands for deep learning across commercial ...
05/26/2021

A Full-Stack Search Technique for Domain Optimized Deep Learning Accelerators

The rapidly-changing deep learning landscape presents a unique opportuni...
11/02/2018

CapsAcc: An Efficient Hardware Accelerator for CapsuleNets with Data Reuse

Deep Neural Networks (DNNs) have been widely deployed for many Machine L...
05/04/2021

HASCO: Towards Agile HArdware and Software CO-design for Tensor Computation

Tensor computations overwhelm traditional general-purpose computing devi...
08/03/2020

A Learned Performance Model for Tensor Processing Units

Accurate hardware performance models are critical to efficient code gene...