Ring-Mesh: A Scalable and High-Performance Approach for Manycore Accelerators

04/06/2019
by   Somnath Mazumdar, et al.
0

There is an increasing number of works addressing the design challenge of fast, scalable solutions for the growing machine learning based application domain. Recently, most of the solutions aimed at improving processing element capabilities to speed up the execution of deep learning (DL) application. However, only a few works focused on the interconnection subsystem as a potential source of performance improvement. Wrapping many cores together offer excellent parallelism, but it comes with multiple challenges (e.g., adequate interconnections). Scalable, power-aware interconnects are required to support such a growing number of processing elements, as well as modern applications. In this paper, we propose a scalable and efficient Network-on-Chip (NoC) architecture fusing the advantages of rings as well as the 2D-mesh without using any bridge router to provide high-performance. A dynamic adaptation mechanism allows to better adapt to the application requirements. Simulation results show better scalability (up to 1024 processing elements) with robust performance in multiple statistical traffic pattern scenarios.

READ FULL TEXT
research
08/01/2021

Data Streaming and Traffic Gathering in Mesh-based NoC for Deep Neural Network Acceleration

The increasing popularity of deep neural network (DNN) applications dema...
research
08/01/2021

Improving the Performance of a NoC-based CNN Accelerator with Gather Support

The increasing application of deep learning technology drives the need f...
research
12/21/2020

Efficient On-Chip Learning for Optical Neural Networks Through Power-Aware Sparse Zeroth-Order Optimization

Optical neural networks (ONNs) have demonstrated record-breaking potenti...
research
08/14/2021

SIAM: Chiplet-based Scalable In-Memory Acceleration with Mesh for Deep Neural Networks

In-memory computing (IMC) on a monolithic chip for deep learning faces d...
research
06/29/2017

Fast Processing of Large Graph Applications Using Asynchronous Architecture

Graph algorithms and techniques are increasingly being used in scientifi...
research
07/10/2018

Eyeriss v2: A Flexible and High-Performance Accelerator for Emerging Deep Neural Networks

The design of DNNs has increasingly focused on reducing the computationa...
research
03/22/2016

Information Processing by Nonlinear Phase Dynamics in Locally Connected Arrays

Research toward powerful information processing systems that circumvent ...

Please sign up or login with your details

Forgot password? Click here to reset