Arbitrary Bit-width Network: A Joint Layer-Wise Quantization and Adaptive Inference Approach

by   Chen Tang, et al.

Conventional model quantization methods use a fixed quantization scheme to different data samples, which ignores the inherent "recognition difficulty" differences between various samples. We propose to feed different data samples with varying quantization schemes to achieve a data-dependent dynamic inference, at a fine-grained layer level. However, enabling this adaptive inference with changeable layer-wise quantization schemes is challenging because the combination of bit-widths and layers is growing exponentially, making it extremely difficult to train a single model in such a vast searching space and use it in practice. To solve this problem, we present the Arbitrary Bit-width Network (ABN), where the bit-widths of a single deep network can change at runtime for different data samples, with a layer-wise granularity. Specifically, first we build a weight-shared layer-wise quantizable "super-network" in which each layer can be allocated with multiple bit-widths and thus quantized differently on demand. The super-network provides a considerably large number of combinations of bit-widths and layers, each of which can be used during inference without retraining or storing myriad models. Second, based on the well-trained super-network, each layer's runtime bit-width selection decision is modeled as a Markov Decision Process (MDP) and solved by an adaptive inference strategy accordingly. Experiments show that the super-network can be built without accuracy degradation, and the bit-widths allocation of each layer can be adjusted to deal with various inputs on the fly. On ImageNet classification, we achieve 1.1 while saving 36.2


page 1

page 8


DNQ: Dynamic Network Quantization

Network quantization is an effective method for the deployment of neural...

Mixed-Precision Quantization with Cross-Layer Dependencies

Quantization is commonly used to compress and accelerate deep neural net...

Bit-Mixer: Mixed-precision networks with runtime bit-width selection

Mixed-precision networks allow for a variable bit-width quantization for...

Mixed-Precision Neural Network Quantization via Learned Layer-wise Importance

The exponentially large discrete search space in mixed-precision quantiz...

UWC: Unit-wise Calibration Towards Rapid Network Compression

This paper introduces a post-training quantization (PTQ) method achievin...

Fractional Skipping: Towards Finer-Grained Dynamic CNN Inference

While increasingly deep networks are still in general desired for achiev...

MultiQuant: A Novel Multi-Branch Topology Method for Arbitrary Bit-width Network Quantization

Arbitrary bit-width network quantization has received significant attent...

Please sign up or login with your details

Forgot password? Click here to reset