A Scalable Near-Memory Architecture for Training Deep Neural Networks on Large In-Memory Datasets

02/19/2018
by   Fabian Schuiki, et al.
0

Most investigations into near-memory hardware accelerators for deep neural networks have primarily focused on inference, while the potential of accelerating training has received relatively little attention so far. Based on an in-depth analysis of the key computational patterns in state-of-the-art gradient-based training methods, we propose an efficient near-memory acceleration engine called NTX that can be used to train state-of-the-art deep convolutional neural networks at scale. Our main contributions are: (i) identifying requirements for efficient data address generation and developing an efficient accelerator offloading scheme reducing overhead by 7x over previously published results; (ii) support a rich set of operations allowing for efficient calculation of the back-propagation phase. The low control overhead allows up to 8 NTX engines to be controlled by a simple processor. Evaluations in a near-memory computing scenario where the accelerator is placed on the logic base die of a Hybrid Memory Cube demonstrate a 2.6x energy efficiency improvement over contemporary GPUs at 4.4x less silicon area, and an average compute performance of 1.01 Tflop/s for training large state-of-the-art networks with full floating-point precision. The architecture is scalable and paves the way towards efficient deep learning in a distributed near-memory setting.

READ FULL TEXT

page 5

page 6

page 10

research
12/01/2018

NTX: An Energy-efficient Streaming Accelerator for Floating-point Generalized Reduction Workloads in 22nm FD-SOI

Specialized coprocessors for Multiply-Accumulate (MAC) intensive workloa...
research
03/02/2020

A New MRAM-based Process In-Memory Accelerator for Efficient Neural Network Training with Floating Point Precision

The excellent performance of modern deep neural networks (DNNs) comes at...
research
02/10/2021

Hybrid In-memory Computing Architecture for the Training of Deep Neural Networks

The cost involved in training deep neural networks (DNNs) on von-Neumann...
research
03/13/2022

FlexBlock: A Flexible DNN Training Accelerator with Multi-Mode Block Floating Point Support

Training deep neural networks (DNNs) is a computationally expensive job,...
research
02/15/2022

HiMA: A Fast and Scalable History-based Memory Access Engine for Differentiable Neural Computer

Memory-augmented neural networks (MANNs) provide better inference perfor...
research
05/13/2021

Combining Emulation and Simulation to Evaluate a Near Memory Key/Value Lookup Accelerator

Processing large numbers of key/value lookups is an integral part of mod...
research
11/15/2019

NeuMMU: Architectural Support for Efficient Address Translations in Neural Processing Units

To satisfy the compute and memory demands of deep neural networks, neura...

Please sign up or login with your details

Forgot password? Click here to reset