An Inter-Layer Weight Prediction and Quantization for Deep Neural Networks based on a Smoothly Varying Weight Hypothesis

07/16/2019
by   Kang-Ho Lee, et al.
8

Network compression for deep neural networks has become an important part of deep learning research, because of increased demand for deep learning models in practical resource-constrained environments. In this paper, we observe that the weights in adjacent convolution layers share strong similarity in shapes and values, i.e., the weights tend to vary smoothly along the layers. We call this phenomenon Smoothly Varying Weight Hypothesis (SVWH). Based on SVWH and an inter-frame prediction method in conventional video coding schemes, we propose a new Inter-Layer Weight Prediction (ILWP) and quantization method which quantize the predicted residuals of the weights. Since the predicted weight residuals tend to follow Laplacian distributions with very low variance, the weight quantization can more effectively be applied, thus producing more zero weights and enhancing weight compression ratio. In addition, we propose a new loss for eliminating non-texture bits, which enabled us to more effectively store only texture bits. That is, the proposed loss regularizes the weights such that the collocated weights between the adjacent two layers have the same values. Our comprehensive experiments show that the proposed method achieved much higher weight compression rate at the same accuracy level compared with the previous quantization-based compression methods in deep neural networks.

READ FULL TEXT

page 4

page 6

research
05/29/2018

Retraining-Based Iterative Weight Quantization for Deep Neural Networks

Model compression has gained a lot of attention due to its ability to re...
research
07/01/2019

Weight Normalization based Quantization for Deep Neural Network Compression

With the development of deep neural networks, the size of network models...
research
07/01/2017

Structured Sparse Ternary Weight Coding of Deep Neural Networks for Efficient Hardware Implementations

Deep neural networks (DNNs) usually demand a large amount of operations ...
research
09/20/2023

SPFQ: A Stochastic Algorithm and Its Error Analysis for Neural Network Quantization

Quantization is a widely used compression method that effectively reduce...
research
09/30/2018

Minimal Random Code Learning: Getting Bits Back from Compressed Model Parameters

While deep neural networks are a highly successful model class, their la...
research
10/24/2022

Weight Fixing Networks

Modern iterations of deep learning models contain millions (billions) of...
research
05/03/2019

Compressibility Loss for Neural Network Weights

In this paper we apply a compressibility loss that enables learning high...

Please sign up or login with your details

Forgot password? Click here to reset