Few Shot Network Compression via Cross Distillation

11/21/2019
by   Haoli Bai, et al.
0

Model compression has been widely adopted to obtain light-weighted deep neural networks. Most prevalent methods, however, require fine-tuning with sufficient training data to ensure accuracy, which could be challenged by privacy and security issues. As a compromise between privacy and performance, in this paper we investigate few shot network compression: given few samples per class, how can we effectively compress the network with negligible performance drop? The core challenge of few shot network compression lies in high estimation errors from the original network during inference, since the compressed network can easily over-fits on the few training instances. The estimation errors could propagate and accumulate layer-wisely and finally deteriorate the network output. To address the problem, we propose cross distillation, a novel layer-wise knowledge distillation approach. By interweaving hidden layers of teacher and student network, layer-wisely accumulated estimation errors can be effectively reduced.The proposed method offers a general framework compatible with prevalent network compression techniques such as pruning. Extensive experiments on benchmark datasets demonstrate that cross distillation can significantly improve the student network's accuracy when only a few training instances are available.

READ FULL TEXT
research
07/21/2023

Distribution Shift Matters for Knowledge Distillation with Webly Collected Images

Knowledge distillation aims to learn a lightweight student network from ...
research
02/03/2019

MICIK: MIning Cross-Layer Inherent Similarity Knowledge for Deep Model Compression

State-of-the-art deep model compression methods exploit the low-rank app...
research
10/29/2021

Model Fusion of Heterogeneous Neural Networks via Cross-Layer Alignment

Layer-wise model fusion via optimal transport, named OTFusion, applies s...
research
02/21/2022

A Novel Architecture Slimming Method for Network Pruning and Knowledge Distillation

Network pruning and knowledge distillation are two widely-known model co...
research
02/16/2022

Practical Network Acceleration with Tiny Sets

Network compression is effective in accelerating the inference of deep n...
research
09/21/2019

Positive-Unlabeled Compression on the Cloud

Many attempts have been done to extend the great success of convolutiona...
research
07/16/2019

Light Multi-segment Activation for Model Compression

Model compression has become necessary when applying neural networks (NN...

Please sign up or login with your details

Forgot password? Click here to reset