Bit-pragmatic Deep Neural Network Computing

10/20/2016
by   J. Albericio, et al.
0

We quantify a source of ineffectual computations when processing the multiplications of the convolutional layers in Deep Neural Networks (DNNs) and propose Pragmatic (PRA), an architecture that exploits it improving performance and energy efficiency. The source of these ineffectual computations is best understood in the context of conventional multipliers which generate internally multiple terms, that is, products of the multiplicand and powers of two, which added together produce the final product [1]. At runtime, many of these terms are zero as they are generated when the multiplicand is combined with the zero-bits of the multiplicator. While conventional bit-parallel multipliers calculate all terms in parallel to reduce individual product latency, PRA calculates only the non-zero terms using a) on-the-fly conversion of the multiplicator representation into an explicit list of powers of two, and b) hybrid bit-parallel multplicand/bit-serial multiplicator processing units. PRA exploits two sources of ineffectual computations: 1) the aforementioned zero product terms which are the result of the lack of explicitness in the multiplicator representation, and 2) the excess in the representation precision used for both multiplicants and multiplicators, e.g., [2]. Measurements demonstrate that for the convolutional layers, a straightforward variant of PRA improves performance by 2.6x over the DaDiaNao (DaDN) accelerator [3] and by 1.4x over STR [4]. Similarly, PRA improves energy efficiency by 28 average compared to DaDN and STR. An improved cross lane synchronication scheme boosts performance improvements to 3.1x over DaDN. Finally, Pragmatic benefits persist even with an 8-bit quantized representation [5].

READ FULL TEXT
research
07/27/2017

Tartan: Accelerating Fully-Connected and Convolutional Layers in Deep Learning Networks by Exploiting Numerical Precision Variability

Tartan (TRT), a hardware accelerator for inference with Deep Neural Netw...
research
02/14/2023

A Bit-Parallel Deterministic Stochastic Multiplier

This paper presents a novel bit-parallel deterministic stochastic multip...
research
02/10/2022

Mixture-of-Rookies: Saving DNN Computations by Predicting ReLU Outputs

Deep Neural Networks (DNNs) are widely used in many applications domains...
research
03/09/2018

Bit-Tactical: Exploiting Ineffectual Computations in Convolutional Neural Networks: Which, Why, and How

We show that, during inference with Convolutional Neural Networks (CNNs)...
research
04/17/2018

DPRed: Making Typical Activation Values Matter In Deep Learning Computing

We show that selecting a fixed precision for all activations in Convolut...
research
03/15/2022

Energy-efficient Dense DNN Acceleration with Signed Bit-slice Architecture

As the number of deep neural networks (DNNs) to be executed on a mobile ...
research
09/04/2023

ADC/DAC-Free Analog Acceleration of Deep Neural Networks with Frequency Transformation

The edge processing of deep neural networks (DNNs) is becoming increasin...

Please sign up or login with your details

Forgot password? Click here to reset