Fractional Skipping: Towards Finer-Grained Dynamic CNN Inference

by   Jianghao Shen, et al.

While increasingly deep networks are still in general desired for achieving state-of-the-art performance, for many specific inputs a simpler network might already suffice. Existing works exploited this observation by learning to skip convolutional layers in an input-dependent manner. However, we argue their binary decision scheme, i.e., either fully executing or completely bypassing one layer for a specific input, can be enhanced by introducing finer-grained, "softer" decisions. We therefore propose a Dynamic Fractional Skipping (DFS) framework. The core idea of DFS is to hypothesize layer-wise quantization (to different bitwidths) as intermediate "soft" choices to be made between fully utilizing and skipping a layer. For each input, DFS dynamically assigns a bitwidth to both weights and activations of each layer, where fully executing and skipping could be viewed as two "extremes" (i.e., full bitwidth and zero bitwidth). In this way, DFS can "fractionally" exploit a layer's expressive power during input-adaptive inference, enabling finer-grained accuracy-computational cost trade-offs. It presents a unified view to link input-adaptive layer skipping and input-adaptive hybrid quantization. Extensive experimental results demonstrate the superior tradeoff between computational cost and model expressive power (accuracy) achieved by DFS. More visualizations also indicate a smooth and consistent transition in the DFS behaviors, especially the learned choices between layer skipping and different quantizations when the total computational budgets vary, validating our hypothesis that layer quantization could be viewed as intermediate variants of layer skipping. Our source code and supplementary material are available at


Dual Dynamic Inference: Enabling More Efficient, Adaptive and Controllable Deep Inference

State-of-the-art convolutional neural networks (CNNs) yield record-break...

FracTrain: Fractionally Squeezing Bit Savings Both Temporally and Spatially for Efficient DNN Training

Recent breakthroughs in deep neural networks (DNNs) have fueled a tremen...

ProgressiveSpinalNet architecture for FC layers

In deeplearning models the FC (fully connected) layer has biggest import...

Arbitrary Bit-width Network: A Joint Layer-Wise Quantization and Adaptive Inference Approach

Conventional model quantization methods use a fixed quantization scheme ...

MIA-Former: Efficient and Robust Vision Transformers via Multi-grained Input-Adaptation

ViTs are often too computationally expensive to be fitted onto real-worl...

FactorizeNet: Progressive Depth Factorization for Efficient Network Architecture Exploration Under Quantization Constraints

Depth factorization and quantization have emerged as two of the principa...

SPIQ: Data-Free Per-Channel Static Input Quantization

Computationally expensive neural networks are ubiquitous in computer vis...

Please sign up or login with your details

Forgot password? Click here to reset