Scalable Compression of Deep Neural Networks

08/26/2016
by   Xing Wang, et al.
0

Deep neural networks generally involve some layers with mil- lions of parameters, making them difficult to be deployed and updated on devices with limited resources such as mobile phones and other smart embedded systems. In this paper, we propose a scalable representation of the network parameters, so that different applications can select the most suitable bit rate of the network based on their own storage constraints. Moreover, when a device needs to upgrade to a high-rate network, the existing low-rate network can be reused, and only some incremental data are needed to be downloaded. We first hierarchically quantize the weights of a pre-trained deep neural network to enforce weight sharing. Next, we adaptively select the bits assigned to each layer given the total bit budget. After that, we retrain the network to fine-tune the quantized centroids. Experimental results show that our method can achieve scalable compression with graceful degradation in the performance.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/01/2015

Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding

Neural networks are both computationally intensive and memory intensive,...
research
01/25/2022

Bit-serial Weight Pools: Compression and Arbitrary Precision Execution of Neural Networks on Resource Constrained Processors

Applications of neural networks on edge systems have proliferated in rec...
research
06/18/2021

Quantized Neural Networks via -1, +1 Encoding Decomposition and Acceleration

The training of deep neural networks (DNNs) always requires intensive re...
research
02/07/2018

Spatially adaptive image compression using a tiled deep network

Deep neural networks represent a powerful class of function approximator...
research
05/24/2018

Multi-Task Zipping via Layer-wise Neuron Sharing

Future mobile devices are anticipated to perceive, understand and react ...
research
05/31/2019

Multi-Precision Quantized Neural Networks via Encoding Decomposition of -1 and +1

The training of deep neural networks (DNNs) requires intensive resources...
research
05/25/2018

Heterogeneous Bitwidth Binarization in Convolutional Neural Networks

Recent work has shown that fast, compact low-bitwidth neural networks ca...

Please sign up or login with your details

Forgot password? Click here to reset