ABS: Automatic Bit Sharing for Model Compression

01/13/2021
by   Jing Liu, et al.
0

We present Automatic Bit Sharing (ABS) to automatically search for optimal model compression configurations (e.g., pruning ratio and bitwidth). Unlike previous works that consider model pruning and quantization separately, we seek to optimize them jointly. To deal with the resultant large designing space, we propose a novel super-bit model, a single-path method, to encode all candidate compression configurations, rather than maintaining separate paths for each configuration. Specifically, we first propose a novel decomposition of quantization that encapsulates all the candidate bitwidths in the search space. Starting from a low bitwidth, we sequentially consider higher bitwidths by recursively adding re-assignment offsets. We then introduce learnable binary gates to encode the choice of bitwidth, including filter-wise 0-bit for pruning. By jointly training the binary gates in conjunction with network parameters, the compression configurations of each layer can be automatically determined. Our ABS brings two benefits for model compression: 1) It avoids the combinatorially large design space, with a reduced number of trainable parameters and search costs. 2) It also averts directly fitting an extremely low bit quantizer to the data, hence greatly reducing the optimization difficulty due to the non-differentiable quantization. Experiments on CIFAR-100 and ImageNet show that our methods achieve significant computational cost reduction while preserving promising performance.

READ FULL TEXT
07/20/2020

Differentiable Joint Pruning and Quantization for Hardware Efficiency

We present a differentiable joint pruning and quantization (DJPQ) scheme...
11/12/2020

Automated Model Compression by Jointly Applied Pruning and Quantization

In the traditional deep compression framework, iteratively performing ne...
04/11/2018

Hybrid Binary Networks: Optimizing for Accuracy, Efficiency and Memory

Binarization is an extreme network compression approach that provides la...
09/09/2020

FleXOR: Trainable Fractional Quantization

Quantization based on the binary codes is gaining attention because each...
05/14/2020

Bayesian Bits: Unifying Quantization and Pruning

We introduce Bayesian Bits, a practical method for joint mixed precision...
07/04/2020

Weight-dependent Gates for Network Pruning

In this paper, we propose a simple and effective network pruning framewo...
12/02/2021

Equal Bits: Enforcing Equally Distributed Binary Network Weights

Binary networks are extremely efficient as they use only two symbols to ...