Bit Fusion: Bit-Level Dynamically Composable Architecture for Accelerating Deep Neural Networks

12/05/2017
by   Hardik Sharma, et al.
0

Hardware acceleration of Deep Neural Networks (DNNs) aims to tame their enormous compute intensity. Fully realizing the potential of acceleration in this domain requires understanding and leveraging algorithmic properties of DNNs. This paper builds upon the algorithmic insight that bitwidth of operations in DNNs can be reduced without compromising their accuracy. However, to prevent accuracy loss, the bitwidth varies significantly across DNNs and it may even be adjusted for each layer individually. Thus, a fixed-bitwidth accelerator would either offer limited benefits to accommodate the worst-case bitwidth, or inevitably lead to a degradation in final accuracy. To alleviate these deficiencies, this work introduces dynamic bit-level fusion/decomposition as a new dimension in the design of DNN accelerators. We explore this dimension by designing Bit Fusion, a bit-flexible accelerator, that constitutes an array of bit-level processing elements that dynamically fuse to match the bitwidth of individual DNN layers. This flexibility in the architecture minimizes the computation and the communication at the finest granularity possible with no loss in accuracy. We evaluate the benefits of Bit Fusion using eight real-world feed-forward and recurrent DNNs. The proposed microarchitecture is implemented in Verilog and synthesized in 45 nm technology. Using the synthesis results and cycle accurate simulation, we compare the benefits of Bit Fusion to two state-of-the-art DNN accelerators, Eyeriss and Stripes. In the same area, frequency, and technology node, Bit Fusion offers 4.3x speedup and 9.6x energy savings over Eyeriss. Bit Fusion provides 2.4x speedup and 4.1x energy reduction over Stripes at 45 nm node when Bit Fusion area and frequency are set to those of Stripes. Compared to Jetson-TX2, Bit Fusion offers 4.3x speedup and almost matches the performance of TitanX, which is 4.6x faster than TX2.

READ FULL TEXT

page 1

page 4

page 6

page 8

page 9

page 10

research
11/25/2020

Ax-BxP: Approximate Blocked Computation for Precision-Reconfigurable Deep Neural Network Acceleration

Precision scaling has emerged as a popular technique to optimize the com...
research
04/11/2020

Bit-Parallel Vector Composability for Neural Acceleration

Conventional neural accelerators rely on isolated self-sufficient functi...
research
09/15/2019

TiM-DNN: Ternary in-Memory accelerator for Deep Neural Networks

The use of lower precision has emerged as a popular technique to optimiz...
research
06/27/2019

Mixed-Signal Charge-Domain Acceleration of Deep Neural networks through Interleaved Bit-Partitioned Arithmetic

Low-power potential of mixed-signal design makes it an alluring option t...
research
03/15/2022

Energy-efficient Dense DNN Acceleration with Signed Bit-slice Architecture

As the number of deep neural networks (DNNs) to be executed on a mobile ...
research
11/11/2020

DLFusion: An Auto-Tuning Compiler for Layer Fusion on Deep Neural Network Accelerator

Many hardware vendors have introduced specialized deep neural networks (...
research
01/27/2022

On the RTL Implementation of FINN Matrix Vector Compute Unit

FPGA-based accelerators are becoming more popular for deep neural networ...

Please sign up or login with your details

Forgot password? Click here to reset