SYQ: Learning Symmetric Quantization For Efficient Deep Neural Networks

07/01/2018
by   Julian Faraone, et al.
2

Inference for state-of-the-art deep neural networks is computationally expensive, making them difficult to deploy on constrained hardware environments. An efficient way to reduce this complexity is to quantize the weight parameters and/or activations during training by approximating their distributions with a limited entry codebook. For very low-precisions, such as binary or ternary networks with 1-8-bit activations, the information loss from quantization leads to significant accuracy degradation due to large gradient mismatches between the forward and backward functions. In this paper, we introduce a quantization method to reduce this loss by learning a symmetric codebook for particular weight subgroups. These subgroups are determined based on their locality in the weight matrix, such that the hardware simplicity of the low-precision representations is preserved. Empirically, we show that symmetric quantization can substantially improve accuracy for networks with extremely low-precision weights and activations. We also demonstrate that this representation imposes minimal or no hardware implications to more coarse-grained approaches. Source code is available at https://www.github.com/julianfaraone/SYQ.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/18/2022

SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models

Large language models (LLMs) show excellent performance but are compute-...
research
02/08/2021

VS-Quant: Per-vector Scaled Quantization for Accurate Low-Precision Neural Network Inference

Quantization enables efficient acceleration of deep neural networks by r...
research
08/03/2022

PalQuant: Accelerating High-precision Networks on Low-precision Accelerators

Recently low-precision deep learning accelerators (DLAs) have become pop...
research
09/28/2020

Rotated Binary Neural Network

Binary Neural Network (BNN) shows its predominance in reducing the compl...
research
02/03/2017

Deep Learning with Low Precision by Half-wave Gaussian Quantization

The problem of quantizing the activations of a deep neural network is co...
research
03/23/2021

ReCU: Reviving the Dead Weights in Binary Neural Networks

Binary neural networks (BNNs) have received increasing attention due to ...
research
11/22/2021

Mesa: A Memory-saving Training Framework for Transformers

There has been an explosion of interest in designing high-performance Tr...

Please sign up or login with your details

Forgot password? Click here to reset