Agile Autotuning of a Transprecision Tensor Accelerator Overlay for TVM Compiler Stack

04/20/2020
by   Dionysios Diamantopoulos, et al.
5

Specialized accelerators for tensor-operations, such as blocked-matrix operations and multi-dimensional convolutions, have been emerged as powerful architecture choices for high-performance Deep-Learning computing. The rapid development of frameworks, models, and precision options challenges the adaptability of such tensor-accelerators since the adaptation to new requirements incurs significant engineering costs. Programmable tensor accelerators offer a promising alternative by allowing reconfiguration of a virtual architecture that overlays on top of the physical FPGA configurable fabric. We propose an overlay (τ-VTA) and an optimization method guided by agile-inspired auto-tuning techniques. We achieve higher performance and faster convergence than state-of-art.

READ FULL TEXT

page 1

page 2

page 3

research
07/11/2018

VTA: An Open Hardware-Software Stack for Deep Learning

Hardware acceleration is an enabler for ubiquitous and efficient deep le...
research
05/03/2021

Bring Your Own Codegen to Deep Learning Compiler

Deep neural networks (DNNs) have been ubiquitously applied in many appli...
research
09/15/2021

Union: A Unified HW-SW Co-Design Ecosystem in MLIR for Evaluating Tensor Operations on Spatial Accelerators

To meet the extreme compute demands for deep learning across commercial ...
research
04/17/2023

TeAAL: A Declarative Framework for Modeling Sparse Tensor Accelerators

Over the past few years, the explosion in sparse tensor algebra workload...
research
05/26/2021

A Full-Stack Search Technique for Domain Optimized Deep Learning Accelerators

The rapidly-changing deep learning landscape presents a unique opportuni...
research
11/02/2018

CapsAcc: An Efficient Hardware Accelerator for CapsuleNets with Data Reuse

Deep Neural Networks (DNNs) have been widely deployed for many Machine L...
research
12/06/2022

Integration of a systolic array based hardware accelerator into a DNN operator auto-tuning framework

The deployment of neural networks on heterogeneous SoCs coupled with cus...

Please sign up or login with your details

Forgot password? Click here to reset