Pruning Ternary Quantization

07/23/2021
by   Dan Liu, et al.
0

We propose pruning ternary quantization (PTQ), a simple, yet effective, symmetric ternary quantization method. The method significantly compresses neural network weights to a sparse ternary of [-1,0,1] and thus reduces computational, storage, and memory footprints. We show that PTQ can convert regular weights to ternary orthonormal bases by simply using pruning and L2 projection. In addition, we introduce a refined straight-through estimator to finalize and stabilize the quantized weights. Our method can provide at most 46x compression ratio on the ResNet-18 structure, with an acceptable accuracy of 65.36 ResNet-18 model from 46 MB to 955KB ( 48x) and a ResNet-50 model from 99 MB to 3.3MB ( 30x), while the top-1 accuracy on ImageNet drops slightly from 69.7 65.3 quantization and thus provides a range of size-accuracy trade-off.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/03/2020

Automatic Pruning for Quantized Neural Networks

Neural network quantization and pruning are two techniques commonly used...
research
10/05/2020

Joint Pruning Quantization for Extremely Sparse Neural Networks

We investigate pruning and quantization for deep neural networks. Our go...
research
04/09/2022

Channel Pruning In Quantization-aware Training: An Adaptive Projection-gradient Descent-shrinkage-splitting Method

We propose an adaptive projection-gradient descent-shrinkage-splitting m...
research
11/30/2019

Pruning at a Glance: Global Neural Pruning for Model Compression

Deep Learning models have become the dominant approach in several areas ...
research
07/12/2019

And the Bit Goes Down: Revisiting the Quantization of Neural Networks

In this paper, we address the problem of reducing the memory footprint o...
research
06/20/2023

DynaQuant: Compressing Deep Learning Training Checkpoints via Dynamic Quantization

With the increase in the scale of Deep Learning (DL) training workloads ...
research
03/07/2019

Efficient and Effective Quantization for Sparse DNNs

Deep convolutional neural networks (CNNs) are powerful tools for a wide ...

Please sign up or login with your details

Forgot password? Click here to reset