Exploiting Weight Redundancy in CNNs: Beyond Pruning and Quantization

06/22/2020
by   Yuan Wen, et al.
0

Pruning and quantization are proven methods for improving the performance and storage efficiency of convolutional neural networks (CNNs). Pruning removes near-zero weights in tensors and masks weak connections between neurons in neighbouring layers. Quantization reduces the precision of weights by replacing them with numerically similar values that require less storage. In this paper, we identify another form of redundancy in CNN weight tensors, in the form of repeated patterns of similar values. We observe that pruning and quantization both tend to drastically increase the number of repeated patterns in the weight tensors. We investigate several compression schemes to take advantage of this structure in CNN weight data, including multiple forms of Huffman coding, and other approaches inspired by block sparse matrix formats. We evaluate our approach on several well-known CNNs and find that we can achieve compaction ratios of 1.4x to 3.1x in addition to the saving from pruning and quantization.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/05/2018

A Unified Framework of DNN Weight Pruning and Weight Clustering/Quantization Using ADMM

Many model compression techniques of Deep Neural Networks (DNNs) have be...
research
08/28/2021

Compact representations of convolutional neural networks via weight pruning and quantization

The state-of-the-art performance for several real-world problems is curr...
research
12/28/2021

Speedup deep learning models on GPU by taking advantage of efficient unstructured pruning and bit-width reduction

This work is focused on the pruning of some convolutional neural network...
research
03/07/2019

Efficient and Effective Quantization for Sparse DNNs

Deep convolutional neural networks (CNNs) are powerful tools for a wide ...
research
02/03/2022

PRUNIX: Non-Ideality Aware Convolutional Neural Network Pruning for Memristive Accelerators

In this work, PRUNIX, a framework for training and pruning convolutional...
research
07/22/2022

Quantized Sparse Weight Decomposition for Neural Network Compression

In this paper, we introduce a novel method of neural network weight comp...
research
12/09/2021

A New Measure of Model Redundancy for Compressed Convolutional Neural Networks

While recently many designs have been proposed to improve the model effi...

Please sign up or login with your details

Forgot password? Click here to reset