Ring-Mesh: A Scalable and High-Performance Approach for Manycore Accelerators

04/06/2019

∙

There is an increasing number of works addressing the design challenge of fast, scalable solutions for the growing machine learning based application domain. Recently, most of the solutions aimed at improving processing element capabilities to speed up the execution of deep learning (DL) application. However, only a few works focused on the interconnection subsystem as a potential source of performance improvement. Wrapping many cores together offer excellent parallelism, but it comes with multiple challenges (e.g., adequate interconnections). Scalable, power-aware interconnects are required to support such a growing number of processing elements, as well as modern applications. In this paper, we propose a scalable and efficient Network-on-Chip (NoC) architecture fusing the advantages of rings as well as the 2D-mesh without using any bridge router to provide high-performance. A dynamic adaptation mechanism allows to better adapt to the application requirements. Simulation results show better scalability (up to 1024 processing elements) with robust performance in multiple statistical traffic pattern scenarios.

READ FULL TEXT

Ring-Mesh: A Scalable and High-Performance Approach for Manycore Accelerators

Sign in with Google

Consider DeepAI Pro