On-Chip Communication Network for Efficient Training of Deep Convolutional Networks on Heterogeneous Manycore Systems

by   Wonje Choi, et al.

Convolutional Neural Networks (CNNs) have shown a great deal of success in diverse application domains including computer vision, speech recognition, and natural language processing. However, as the size of datasets and the depth of neural network architectures continue to grow, it is imperative to design high-performance and energy-efficient computing hardware for training CNNs. In this paper, we consider the problem of designing specialized CPU-GPU based heterogeneous manycore systems for energy-efficient training of CNNs. It has already been shown that the typical on-chip communication infrastructures employed in conventional CPU-GPU based heterogeneous manycore platforms are unable to handle both CPU and GPU communication requirements efficiently. To address this issue, we first analyze the on-chip traffic patterns that arise from the computational processes associated with training two deep CNN architectures, namely, LeNet and CDBNet, to perform image classification. By leveraging this knowledge, we design a hybrid Network-on-Chip (NoC) architecture, which consists of both wireline and wireless links, to improve the performance of CPU-GPU based heterogeneous manycore platforms running the above-mentioned CNN training workloads. The proposed NoC achieves 1.8x reduction in network latency and improves the network throughput by a factor of 2.2 for training CNNs, when compared to a highly-optimized wireline mesh NoC. For the considered CNN workloads, these network-level improvements translate into 25 that the proposed hybrid NoC for heterogeneous manycore architectures is capable of significantly accelerating training of CNNs while remaining energy-efficient.



There are no comments yet.


page 1

page 5

page 9

page 10

page 12

page 13


Learning-based Application-Agnostic 3D NoC Design for Heterogeneous Manycore Systems

The rising use of deep learning and other big-data algorithms has led to...

ReaLPrune: ReRAM Crossbar-aware Lottery Ticket Pruned CNNs

ReRAM-based architectures offer high-performance yet energy efficient co...

Caffe con Troll: Shallow Ideas to Speed Up Deep Learning

We present Caffe con Troll (CcT), a fully compatible end-to-end version ...

Enabling Highly Efficient Capsule Networks Processing Through A PIM-Based Architecture Design

In recent years, the CNNs have achieved great successes in the image pro...

System-level optimization of Network-on-Chips for heterogeneous 3D System-on-Chips

For a system-level design of Networks-on-Chip for 3D heterogeneous Syste...

Improving the Performance of a NoC-based CNN Accelerator with Gather Support

The increasing application of deep learning technology drives the need f...

Understanding the Impact of On-chip Communication on DNN Accelerator Performance

Deep Neural Networks have flourished at an unprecedented pace in recent ...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.