Hyperspherical Quantization: Toward Smaller and More Accurate Models

12/24/2022
by   Dan Liu, et al.
0

Model quantization enables the deployment of deep neural networks under resource-constrained devices. Vector quantization aims at reducing the model size by indexing model weights with full-precision embeddings, i.e., codewords, while the index needs to be restored to 32-bit during computation. Binary and other low-precision quantization methods can reduce the model size up to 32×, however, at the cost of a considerable accuracy drop. In this paper, we propose an efficient framework for ternary quantization to produce smaller and more accurate compressed models. By integrating hyperspherical learning, pruning and reinitialization, our proposed Hyperspherical Quantization (HQ) method reduces the cosine distance between the full-precision and ternary weights, thus reducing the bias of the straight-through gradient estimator during ternary quantization. Compared with existing work at similar compression levels (∼30×, ∼40×), our method significantly improves the test accuracy and reduces the model size.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/18/2019

Adaptive Loss-aware Quantization for Multi-bit Networks

We investigate the compression of deep neural networks by quantizing the...
research
03/02/2023

Ternary Quantization: A Survey

Inference time, model size, and accuracy are critical for deploying deep...
research
12/08/2021

Neural Network Quantization for Efficient Inference: A Survey

As neural networks have become more powerful, there has been a rising de...
research
02/24/2022

Standard Deviation-Based Quantization for Deep Neural Networks

Quantization of deep neural networks is a promising approach that reduce...
research
10/01/2018

ProxQuant: Quantized Neural Networks via Proximal Operators

To make deep neural networks feasible in resource-constrained environmen...
research
11/13/2019

DupNet: Towards Very Tiny Quantized CNN with Improved Accuracy for Face Detection

Deploying deep learning based face detectors on edge devices is a challe...
research
06/12/2023

Efficient Quantization-aware Training with Adaptive Coreset Selection

The expanding model size and computation of deep neural networks (DNNs) ...

Please sign up or login with your details

Forgot password? Click here to reset