Ternary Quantization: A Survey

03/02/2023
by   Dan Liu, et al.
0

Inference time, model size, and accuracy are critical for deploying deep neural network models. Numerous research efforts have been made to compress neural network models with faster inference and higher accuracy. Pruning and quantization are mainstream methods to this end. During model quantization, converting individual float values of layer weights to low-precision ones can substantially reduce the computational overhead and improve the inference speed. Many quantization methods have been studied, for example, vector quantization, low-bit quantization, and binary/ternary quantization. This survey focuses on ternary quantization. We review the evolution of ternary quantization and investigate the relationships among existing ternary quantization methods from the perspective of projection function and optimization methods.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/24/2022

Hyperspherical Quantization: Toward Smaller and More Accurate Models

Model quantization enables the deployment of deep neural networks under ...
research
09/26/2019

Smart Ternary Quantization

Neural network models are resource hungry. Low bit quantization such as ...
research
02/18/2020

Robust Quantization: One Model to Rule Them All

Neural network quantization methods often involve simulating the quantiz...
research
02/12/2021

Confounding Tradeoffs for Neural Network Quantization

Many neural network quantization techniques have been developed to decre...
research
09/23/2015

A review of learning vector quantization classifiers

In this work we present a review of the state of the art of Learning Vec...
research
07/20/2022

Mixed-Precision Inference Quantization: Radically Towards Faster inference speed, Lower Storage requirement, and Lower Loss

Based on the model's resilience to computational noise, model quantizati...
research
03/25/2021

A Survey of Quantization Methods for Efficient Neural Network Inference

As soon as abstract mathematical computations were adapted to computatio...

Please sign up or login with your details

Forgot password? Click here to reset