Accelerating Fully Connected Neural Network on Optical Network-on-Chip (ONoC)

by   Fei Dai, et al.

Fully Connected Neural Network (FCNN) is a class of Artificial Neural Networks widely used in computer science and engineering, whereas the training process can take a long time with large datasets in existing many-core systems. Optical Network-on-Chip (ONoC), an emerging chip-scale optical interconnection technology, has great potential to accelerate the training of FCNN with low transmission delay, low power consumption, and high throughput. However, existing methods based on Electrical Network-on-Chip (ENoC) cannot fit in ONoC because of the unique properties of ONoC. In this paper, we propose a fine-grained parallel computing model for accelerating FCNN training on ONoC and derive the optimal number of cores for each execution stage with the objective of minimizing the total amount of time to complete one epoch of FCNN training. To allocate the optimal number of cores for each execution stage, we present three mapping strategies and compare their advantages and disadvantages in terms of hotspot level, memory requirement, and state transitions. Simulation results show that the average prediction error for the optimal number of cores in NN benchmarks is within 2.3 simulations which demonstrate that FCNN training time can be reduced by 22.28 and 4.91 parallel computing methods that either allocate a fixed number of cores or allocate as many cores as possible, respectively. Compared with ENoC, simulation results show that under batch sizes of 64 and 128, on average ONoC can achieve 21.02 on saving energy, respectively.


Pareto-Optimization Framework for Automated Network-on-Chip Design

With the advent of multi-core processors, network-on-chip design has bee...

Reducing Memory Requirements for the IPU using Butterfly Factorizations

High Performance Computing (HPC) benefits from different improvements du...

An Artificial Neural Networks based Temperature Prediction Framework for Network-on-Chip based Multicore Platform

Continuous improvement in silicon process technologies has made possible...

ANDROMEDA: An FPGA Based RISC-V MPSoC Exploration Framework

With the growing demands of consumer electronic products, the computatio...

An optimal scheduling architecture for accelerating batch algorithms on Neural Network processor architectures

In neural network topologies, algorithms are running on batches of data ...

Parallelizing Bisection Root-Finding: A Case for Accelerating Serial Algorithms in Multicore Substrates

Multicore architectures dominate today's processor market. Even though t...

3D Field Simulation Model for Bond Wire On-Chip Inductors Validated by Measurements

This paper proposes 3D field simulation models for different designs of ...

Please sign up or login with your details

Forgot password? Click here to reset