Data Streaming and Traffic Gathering in Mesh-based NoC for Deep Neural Network Acceleration

08/01/2021
by   Binayak Tiwari, et al.
0

The increasing popularity of deep neural network (DNN) applications demands high computing power and efficient hardware accelerator architecture. DNN accelerators use a large number of processing elements (PEs) and on-chip memory for storing weights and other parameters. As the communication backbone of a DNN accelerator, networks-on-chip (NoC) play an important role in supporting various dataflow patterns and enabling processing with communication parallelism in a DNN accelerator. However, the widely used mesh-based NoC architectures inherently cannot support the efficient one-to-many and many-to-one traffic largely existing in DNN workloads. In this paper, we propose a modified mesh architecture with a one-way/two-way streaming bus to speedup one-to-many (multicast) traffic, and the use of gather packets to support many-to-one (gather) traffic. The analysis of the runtime latency of a convolutional layer shows that the two-way streaming architecture achieves better improvement than the one-way streaming architecture for an Output Stationary (OS) dataflow architecture. The simulation results demonstrate that the gather packets can help to reduce the runtime latency up to 1.8 times and network power consumption up to 1.7 times, compared with the repetitive unicast method on modified mesh architectures supporting two-way streaming.

READ FULL TEXT
research
08/01/2021

Improving the Performance of a NoC-based CNN Accelerator with Gather Support

The increasing application of deep learning technology drives the need f...
research
09/21/2022

In-Network Accumulation: Extending the Role of NoC for DNN Acceleration

Network-on-Chip (NoC) plays a significant role in the performance of a D...
research
07/31/2023

PATRONoC: Parallel AXI Transport Reducing Overhead for Networks-on-Chip targeting Multi-Accelerator DNN Platforms at the Edge

Emerging deep neural network (DNN) applications require high-performance...
research
11/25/2021

A Dense Tensor Accelerator with Data Exchange Mesh for DNN and Vision Workloads

We propose a dense tensor accelerator called VectorMesh, a scalable, mem...
research
04/06/2019

Ring-Mesh: A Scalable and High-Performance Approach for Manycore Accelerators

There is an increasing number of works addressing the design challenge o...
research
05/04/2023

A Quantitative Analysis and Guideline of Data Streaming Accelerator in Intel 4th Gen Xeon Scalable Processors

As semiconductor power density is no longer constant with the technology...
research
09/26/2022

FastStamp: Accelerating Neural Steganography and Digital Watermarking of Images on FPGAs

Steganography and digital watermarking are the tasks of hiding recoverab...

Please sign up or login with your details

Forgot password? Click here to reset