Dataflow-based Joint Quantization of Weights and Activations for Deep Neural Networks

01/04/2019
by   Xue Geng, et al.
14

This paper addresses a challenging problem - how to reduce energy consumption without incurring performance drop when deploying deep neural networks (DNNs) at the inference stage. In order to alleviate the computation and storage burdens, we propose a novel dataflow-based joint quantization approach with the hypothesis that a fewer number of quantization operations would incur less information loss and thus improve the final performance. It first introduces a quantization scheme with efficient bit-shifting and rounding operations to represent network parameters and activations in low precision. Then it restructures the network architectures to form unified modules for optimization on the quantized model. Extensive experiments on ImageNet and KITTI validate the effectiveness of our model, demonstrating that state-of-the-art results for various tasks can be achieved by this quantized model. Besides, we designed and synthesized an RTL model to measure the hardware costs among various quantization methods. For each quantization operation, it reduces area cost by about 15 times and energy consumption by about 9 times, compared to a strong baseline.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/24/2021

DNN Quantization with Attention

Low-bit quantization of network weights and activations can drastically ...
research
08/17/2018

Joint Training of Low-Precision Neural Network with Quantization Interval Parameters

Optimization for low-precision neural network is an important technique ...
research
03/09/2021

MWQ: Multiscale Wavelet Quantized Neural Networks

Model quantization can reduce the model size and computational latency, ...
research
11/01/2017

Minimum Energy Quantized Neural Networks

This work targets the automated minimum-energy optimization of Quantized...
research
10/22/2020

On Resource-Efficient Bayesian Network Classifiers and Deep Neural Networks

We present two methods to reduce the complexity of Bayesian network (BN)...
research
04/06/2020

A Learning Framework for n-bit Quantized Neural Networks toward FPGAs

The quantized neural network (QNN) is an efficient approach for network ...
research
09/26/2022

Going Further With Winograd Convolutions: Tap-Wise Quantization for Efficient Inference on 4x4 Tile

Most of today's computer vision pipelines are built around deep neural n...

Please sign up or login with your details

Forgot password? Click here to reset