A Dense Tensor Accelerator with Data Exchange Mesh for DNN and Vision Workloads

11/25/2021
by   Yu-Sheng Lin, et al.
0

We propose a dense tensor accelerator called VectorMesh, a scalable, memory-efficient architecture that can support a wide variety of DNN and computer vision workloads. Its building block is a tile execution unit (TEU), which includes dozens of processing elements (PEs) and SRAM buffers connected through a butterfly network. A mesh of FIFOs between the TEUs facilitates data exchange between tiles and promote local data to global visibility. Our design performs better according to the roofline model for CNN, GEMM, and spatial matching algorithms compared to state-of-the-art architectures. It can reduce global buffer and DRAM fetches by 2-22 times and up to 5 times, respectively.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/01/2021

Data Streaming and Traffic Gathering in Mesh-based NoC for Deep Neural Network Acceleration

The increasing popularity of deep neural network (DNN) applications dema...
research
09/21/2022

In-Network Accumulation: Extending the Role of NoC for DNN Acceleration

Network-on-Chip (NoC) plays a significant role in the performance of a D...
research
08/10/2022

A Fresh Perspective on DNN Accelerators by Performing Holistic Analysis Across Paradigms

Traditional computers with von Neumann architecture are unable to meet t...
research
11/07/2019

MERIT: Tensor Transform for Memory-Efficient Vision Processing on Parallel Architectures

Computationally intensive deep neural networks (DNNs) are well-suited to...
research
10/07/2021

MAPA: Multi-Accelerator Pattern Allocation Policy for Multi-Tenant GPU Servers

Multi-accelerator servers are increasingly being deployed in shared mult...
research
05/16/2020

Systolic Tensor Array: An Efficient Structured-Sparse GEMM Accelerator for Mobile CNN Inference

Convolutional neural network (CNN) inference on mobile devices demands e...

Please sign up or login with your details

Forgot password? Click here to reset