A Low-Power Accelerator for Deep Neural Networks with Enlarged Near-Zero Sparsity

05/22/2017
by   Yuxiang Huan, et al.
0

It remains a challenge to run Deep Learning in devices with stringent power budget in the Internet-of-Things. This paper presents a low-power accelerator for processing Deep Neural Networks in the embedded devices. The power reduction is realized by avoiding multiplications of near-zero valued data. The near-zero approximation and a dedicated Near-Zero Approximation Unit (NZAU) are proposed to predict and skip the near-zero multiplications under certain thresholds. Compared with skipping zero-valued computations, our design achieves 1.92X and 1.51X further reduction of the total multiplications in LeNet-5 and Alexnet respectively, with negligible lose of accuracy. In the proposed accelerator, 256 multipliers are grouped into 16 independent Processing Lanes (PL) to support up to 16 neuron activations simultaneously. With the help of data pre-processing and buffering in each PL, multipliers can be clock-gated in most of the time even the data is excessively streaming in. Designed and simulated in UMC 65 nm process, the accelerator operating at 500 MHz is > 4X faster than the mobile GPU Tegra K1 in processing the fully-connected layer FC8 of Alexnet, while consuming 717X less energy.

READ FULL TEXT

page 2

page 4

research
07/08/2017

A Reconfigurable Streaming Deep Convolutional Neural Network Accelerator for Internet of Things

Convolutional neural network (CNN) offers significant accuracy in image ...
research
09/08/2022

Hardware Accelerator and Neural Network Co-Optimization for Ultra-Low-Power Audio Processing Devices

The increasing spread of artificial neural networks does not stop at ult...
research
11/25/2020

Low Latency CMOS Hardware Acceleration for Fully Connected Layers in Deep Neural Networks

We present a novel low latency CMOS hardware accelerator for fully conne...
research
09/12/2023

DSLOT-NN: Digit-Serial Left-to-Right Neural Network Accelerator

We propose a Digit-Serial Left-tO-righT (DSLOT) arithmetic based process...
research
03/29/2018

Fine-Grained Energy Profiling for Deep Convolutional Neural Networks on the Jetson TX1

Energy-use is a key concern when migrating current deep learning applica...
research
04/17/2018

DPRed: Making Typical Activation Values Matter In Deep Learning Computing

We show that selecting a fixed precision for all activations in Convolut...
research
03/15/2023

Gated Compression Layers for Efficient Always-On Models

Mobile and embedded machine learning developers frequently have to compr...

Please sign up or login with your details

Forgot password? Click here to reset