Understanding the Impact of Precision Quantization on the Accuracy and Energy of Neural Networks

by   Soheil Hashemi, et al.

Deep neural networks are gaining in popularity as they are used to generate state-of-the-art results for a variety of computer vision and machine learning applications. At the same time, these networks have grown in depth and complexity in order to solve harder problems. Given the limitations in power budgets dedicated to these networks, the importance of low-power, low-memory solutions has been stressed in recent years. While a large number of dedicated hardware using different precisions has recently been proposed, there exists no comprehensive study of different bit precisions and arithmetic in both inputs and network parameters. In this work, we address this issue and perform a study of different bit-precisions in neural networks (from floating-point to fixed-point, powers of two, and binary). In our evaluation, we consider and analyze the effect of precision scaling on both network accuracy and hardware metrics including memory footprint, power and energy consumption, and design area. We also investigate training-time methodologies to compensate for the reduction in accuracy due to limited bit precision and demonstrate that in most cases, precision scaling can deliver significant benefits in design metrics at the cost of very modest decreases in network accuracy. In addition, we propose that a small portion of the benefits achieved when using lower precisions can be forfeited to increase the network size and therefore the accuracy. We evaluate our experiments, using three well-recognized networks and datasets to show its generality. We investigate the trade-offs and highlight the benefits of using lower precisions in terms of energy and memory footprint.


Low-Precision Floating-Point Schemes for Neural Network Training

The use of low-precision fixed-point arithmetic along with stochastic ro...

Hardware-Software Codesign of Accurate, Multiplier-free Deep Neural Networks

While Deep Neural Networks (DNNs) push the state-of-the-art in many mach...

On the accuracy and performance of the lattice Boltzmann method with 64-bit, 32-bit and novel 16-bit number formats

Fluid dynamics simulations with the lattice Boltzmann method (LBM) are v...

Comprehensive Benchmarking of Binary Neural Networks on NVM Crossbar Architectures

Non-volatile memory (NVM) crossbars have been identified as a promising ...

FPPU: Design and Implementation of a Pipelined Full Posit Processing Unit

By exploiting the modular RISC-V ISA this paper presents the customizati...

Minimizing Area and Energy of Deep Learning Hardware Design Using Collective Low Precision and Structured Compression

Deep learning algorithms have shown tremendous success in many recogniti...

AddNet: Deep Neural Networks Using FPGA-Optimized Multipliers

Low-precision arithmetic operations to accelerate deep-learning applicat...

Please sign up or login with your details

Forgot password? Click here to reset