DeepAI AI Chat
Log In Sign Up

Efficient and Effective Quantization for Sparse DNNs

03/07/2019
by   Yiren Zhao, et al.
University of Cambridge
Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences
0

Deep convolutional neural networks (CNNs) are powerful tools for a wide range of vision tasks, but the enormous amount of memory and compute resources required by CNNs poses a challenge in deploying them on constrained devices. Existing compression techniques show promising performance in reducing the size and computation complexity of CNNs for efficient inference, but there lacks a method to integrate them effectively. In this paper, we attend to the statistical properties of sparse CNNs and present focused quantization, a novel quantization strategy based on powers-of-two values, which exploits the weight distributions after fine-grained pruning. The proposed method dynamically discovers the most effective numerical representation for weights in layers with varying sparsities, to minimize the impact of quantization on the task accuracy. Multiplications in quantized CNNs can be replaced with much cheaper bit-shift operations for efficient inference. Coupled with lossless encoding, we build a compression pipeline that provides CNNs high compression ratios (CR) and minimal loss in accuracies. In ResNet-50, we achieve a 18.08 × CR with only 0.24% loss in top-5 accuracy, outperforming existing compression pipelines.

READ FULL TEXT

page 1

page 2

page 3

page 4

03/04/2022

Structured Pruning is All You Need for Pruning CNNs at Initialization

Pruning is a popular technique for reducing the model size and computati...
12/29/2018

Quantized Guided Pruning for Efficient Hardware Implementations of Convolutional Neural Networks

Convolutional Neural Networks (CNNs) are state-of-the-art in numerous co...
08/29/2019

Smaller Models, Better Generalization

Reducing network complexity has been a major research focus in recent ye...
07/23/2021

Pruning Ternary Quantization

We propose pruning ternary quantization (PTQ), a simple, yet effective, ...
11/24/2019

Pyramid Vector Quantization and Bit Level Sparsity in Weights for Efficient Neural Networks Inference

This paper discusses three basic blocks for the inference of convolution...
06/22/2020

Exploiting Weight Redundancy in CNNs: Beyond Pruning and Quantization

Pruning and quantization are proven methods for improving the performanc...
12/20/2019

EAST: Encoding-Aware Sparse Training for Deep Memory Compression of ConvNets

The implementation of Deep Convolutional Neural Networks (ConvNets) on t...

Code Repositories

mayo

Mayo: Auto-generation of hardware-friendly deep neural networks.


view repo