Mini-batch Serialization: CNN Training with Inter-layer Data Reuse

09/30/2018
by   Sangkug Lym, et al.
0

Training convolutional neural networks (CNNs) requires intense computations and high memory bandwidth. We find that bandwidth today is over-provisioned because most memory accesses in CNN training can be eliminated by rearranging computation to better utilize on-chip buffers and avoid traffic resulting from large per-layer memory footprints. We introduce the MBS CNN training approach that significantly reduces memory traffic by partially serializing mini-batch processing across groups of layers. This optimizes reuse within on-chip buffers and balances both intra-layer and inter-layer reuse. We also introduce the WaveCore CNN training accelerator that effectively trains CNNs in the MBS approach with high functional-unit utilization. Combined, WaveCore and MBS reduce DRAM traffic by 73 energy for modern deep CNN training compared to conventional training mechanisms and accelerators.

READ FULL TEXT

page 8

page 9

research
06/27/2021

OCCAM: Optimal Data Reuse for Convolutional Neural Networks

Convolutional neural networks (CNNs) are emerging as powerful tools for ...
research
07/04/2018

Restructuring Batch Normalization to Accelerate CNN Training

Because CNN models are compute-intensive, where billions of operations c...
research
06/18/2018

Partitioning Compute Units in CNN Acceleration for Statistical Memory Traffic Shaping

The design complexity of CNNs has been steadily increasing to improve ac...
research
04/20/2021

CoDR: Computation and Data Reuse Aware CNN Accelerator

Computation and Data Reuse is critical for the resource-limited Convolut...
research
08/01/2021

Improving the Performance of a NoC-based CNN Accelerator with Gather Support

The increasing application of deep learning technology drives the need f...
research
04/18/2021

Barrier-Free Large-Scale Sparse Tensor Accelerator (BARISTA) For Convolutional Neural Networks

Convolutional neural networks (CNNs) are emerging as powerful tools for ...
research
07/13/2021

FLAT: An Optimized Dataflow for Mitigating Attention Performance Bottlenecks

Attention mechanisms form the backbone of state-of-the-art machine learn...

Please sign up or login with your details

Forgot password? Click here to reset