Energy-efficient Dense DNN Acceleration with Signed Bit-slice Architecture

03/15/2022
by   Dongseok Im, et al.
0

As the number of deep neural networks (DNNs) to be executed on a mobile system-on-chip (SoC) increases, the mobile SoC suffers from the real-time DNN acceleration within its limited hardware resources and power budget. Although the previous mobile neural processing units (NPUs) take advantage of low-bit computing and exploitation of the sparsity, it is incapable of accelerating high-precision and dense DNNs. This paper proposes energy-efficient signed bit-slice architecture which accelerates both high-precision and dense DNNs by exploiting a large number of zero values of signed bit-slices. Proposed signed bit-slice representation (SBR) changes signed 1111_2 bit-slice to 0000_2 by borrowing a 1 value from its lower order of bit-slice. As a result, it generates a large number of zero bit-slices even in dense DNNs. Moreover, it balances the positive and negative values of 2's complement data, allowing bit-slice based output speculation which pre-computes high order of bit-slices and skips the remaining dense low order of bit-slices. The signed bit-slice architecture compresses and skips the zero input signed bit-slices, and the zero skipping unit also supports the output skipping by masking the speculated inputs as zero. Additionally, the heterogeneous network-on-chip (NoC) benefits the exploitation of data reusability and reduction of transmission bandwidth. The paper introduces a specialized instruction set architecture (ISA) and a hierarchical instruction decoder for the control of the signed bit-slice architecture. Finally, the signed bit-slice architecture outperforms the previous bit-slice accelerator, Bit-fusion, over ×3.65 higher area-efficiency, ×3.88 higher energy-efficiency, and ×5.35 higher throughput.

READ FULL TEXT
research
09/18/2019

Exploring Bit-Slice Sparsity in Deep Neural Networks for Efficient ReRAM-Based Deployment

Emerging resistive random-access memory (ReRAM) has recently been intens...
research
01/03/2022

Freeway to Memory Level Parallelism in Slice-Out-of-Order Cores

Exploiting memory level parallelism (MLP) is crucial to hide long memory...
research
03/16/2018

Memory Slices: A Modular Building Block for Scalable, Intelligent Memory Systems

While reduction in feature size makes computation cheaper in terms of la...
research
12/05/2017

Bit Fusion: Bit-Level Dynamically Composable Architecture for Accelerating Deep Neural Networks

Hardware acceleration of Deep Neural Networks (DNNs) aims to tame their ...
research
02/10/2022

Mixture-of-Rookies: Saving DNN Computations by Predicting ReLU Outputs

Deep Neural Networks (DNNs) are widely used in many applications domains...
research
10/20/2016

Bit-pragmatic Deep Neural Network Computing

We quantify a source of ineffectual computations when processing the mul...
research
04/11/2020

Bit-Parallel Vector Composability for Neural Acceleration

Conventional neural accelerators rely on isolated self-sufficient functi...

Please sign up or login with your details

Forgot password? Click here to reset