HiMA: A Fast and Scalable History-based Memory Access Engine for Differentiable Neural Computer

02/15/2022
by   Yaoyu Tao, et al.
0

Memory-augmented neural networks (MANNs) provide better inference performance in many tasks with the help of an external memory. The recently developed differentiable neural computer (DNC) is a MANN that has been shown to outperform in representing complicated data structures and learning long-term dependencies. DNC's higher performance is derived from new history-based attention mechanisms in addition to the previously used content-based attention mechanisms. History-based mechanisms require a variety of new compute primitives and state memories, which are not supported by existing neural network (NN) or MANN accelerators. We present HiMA, a tiled, history-based memory access engine with distributed memories in tiles. HiMA incorporates a multi-mode network-on-chip (NoC) to reduce the communication latency and improve scalability. An optimal submatrix-wise memory partition strategy is applied to reduce the amount of NoC traffic; and a two-stage usage sort method leverages distributed tiles to improve computation speed. To make HiMA fundamentally scalable, we create a distributed version of DNC called DNC-D to allow almost all memory operations to be applied to local memories with trainable weighted summation to produce the global memory output. Two approximation techniques, usage skimming and softmax approximation, are proposed to further enhance hardware efficiency. HiMA prototypes are created in RTL and synthesized in a 40nm technology. By simulations, HiMA running DNC and DNC-D demonstrates 6.47x and 39.1x higher speed, 22.8x and 164.3x better area efficiency, and 6.1x and 61.2x better energy efficiency over the state-of-the-art MANN accelerator. Compared to an Nvidia 3080Ti GPU, HiMA demonstrates speedup by up to 437x and 2,646x when running DNC and DNC-D, respectively.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/22/2020

A^3: Accelerating Attention Mechanisms in Neural Networks with Approximation

With the increasing computational demands of neural networks, many hardw...
research
02/15/2022

Fast and Scalable Memristive In-Memory Sorting with Column-Skipping Algorithm

Memristive in-memory sorting has been proposed recently to improve hardw...
research
08/09/2020

SEALing Neural Network Models in Secure Deep Learning Accelerators

Deep learning (DL) accelerators are increasingly deployed on edge device...
research
02/19/2018

A Scalable Near-Memory Architecture for Training Deep Neural Networks on Large In-Memory Datasets

Most investigations into near-memory hardware accelerators for deep neur...
research
02/12/2019

A Case for Superconducting Accelerators

As the scaling of conventional CMOS-based technologies slows down, there...
research
05/08/2023

Scalable Optimal Margin Distribution Machine

Optimal margin Distribution Machine (ODM) is a newly proposed statistica...
research
01/08/2023

A Multi-Site Accelerator-Rich Processing Fabric for Scalable Brain-Computer Interfacing

Hull is an accelerator-rich distributed implantable Brain-Computer Inter...

Please sign up or login with your details

Forgot password? Click here to reset