Log In Sign Up

Agile Autotuning of a Transprecision Tensor Accelerator Overlay for TVM Compiler Stack

by   Dionysios Diamantopoulos, et al.

Specialized accelerators for tensor-operations, such as blocked-matrix operations and multi-dimensional convolutions, have been emerged as powerful architecture choices for high-performance Deep-Learning computing. The rapid development of frameworks, models, and precision options challenges the adaptability of such tensor-accelerators since the adaptation to new requirements incurs significant engineering costs. Programmable tensor accelerators offer a promising alternative by allowing reconfiguration of a virtual architecture that overlays on top of the physical FPGA configurable fabric. We propose an overlay (τ-VTA) and an optimization method guided by agile-inspired auto-tuning techniques. We achieve higher performance and faster convergence than state-of-art.


page 1

page 2

page 3


VTA: An Open Hardware-Software Stack for Deep Learning

Hardware acceleration is an enabler for ubiquitous and efficient deep le...

Bring Your Own Codegen to Deep Learning Compiler

Deep neural networks (DNNs) have been ubiquitously applied in many appli...

Union: A Unified HW-SW Co-Design Ecosystem in MLIR for Evaluating Tensor Operations on Spatial Accelerators

To meet the extreme compute demands for deep learning across commercial ...

A Full-Stack Search Technique for Domain Optimized Deep Learning Accelerators

The rapidly-changing deep learning landscape presents a unique opportuni...

CapsAcc: An Efficient Hardware Accelerator for CapsuleNets with Data Reuse

Deep Neural Networks (DNNs) have been widely deployed for many Machine L...

HASCO: Towards Agile HArdware and Software CO-design for Tensor Computation

Tensor computations overwhelm traditional general-purpose computing devi...

A Learned Performance Model for Tensor Processing Units

Accurate hardware performance models are critical to efficient code gene...