Progressive Network Grafting for Few-Shot Knowledge Distillation

12/09/2020
by   Chengchao Shen, et al.
0

Knowledge distillation has demonstrated encouraging performances in deep model compression. Most existing approaches, however, require massive labeled data to accomplish the knowledge transfer, making the model compression a cumbersome and costly process. In this paper, we investigate the practical few-shot knowledge distillation scenario, where we assume only a few samples without human annotations are available for each category. To this end, we introduce a principled dual-stage distillation scheme tailored for few-shot data. In the first step, we graft the student blocks one by one onto the teacher, and learn the parameters of the grafted block intertwined with those of the other teacher blocks. In the second step, the trained student blocks are progressively connected and then together grafted onto the teacher network, allowing the learned student blocks to adapt themselves to each other and eventually replace the teacher network. Experiments demonstrate that our approach, with only a few unlabeled samples, achieves gratifying results on CIFAR10, CIFAR100, and ILSVRC-2012. On CIFAR10 and CIFAR100, our performances are even on par with those of knowledge distillation schemes that utilize the full datasets. The source code is available at https://github.com/zju-vipa/NetGraft.

READ FULL TEXT

page 3

page 7

research
05/20/2019

Zero-Shot Knowledge Distillation in Deep Networks

Knowledge distillation deals with the problem of training a smaller mode...
research
07/25/2022

Black-box Few-shot Knowledge Distillation

Knowledge distillation (KD) is an efficient approach to transfer the kno...
research
02/09/2023

Toward Extremely Lightweight Distracted Driver Recognition With Distillation-Based Neural Architecture Search and Knowledge Transfer

The number of traffic accidents has been continuously increasing in rece...
research
01/28/2023

Few-shot Face Image Translation via GAN Prior Distillation

Face image translation has made notable progress in recent years. Howeve...
research
09/12/2022

Switchable Online Knowledge Distillation

Online Knowledge Distillation (OKD) improves the involved models by reci...
research
02/05/2021

Show, Attend and Distill:Knowledge Distillation via Attention-based Feature Matching

Knowledge distillation extracts general knowledge from a pre-trained tea...
research
08/31/2023

MoMA: Momentum Contrastive Learning with Multi-head Attention-based Knowledge Distillation for Histopathology Image Analysis

There is no doubt that advanced artificial intelligence models and high ...

Please sign up or login with your details

Forgot password? Click here to reset