Knowledge Distillation from Few Samples

12/05/2018
by   Tianhong Li, et al.
0

Current knowledge distillation methods require full training data to distill knowledge from a large "teacher" network to a compact "student" network by matching certain statistics between "teacher" and "student" such as softmax outputs and feature responses. This is not only time-consuming but also inconsistent with human cognition in which children can learn knowledge from adults with few examples. This paper proposes a novel and simple method for knowledge distillation from few samples. Taking the assumption that both "teacher" and "student" have the same feature map sizes at each corresponding block, we add a 1x1 conv-layer at the end of each block in the student-net, and align the block-level outputs between "teacher" and "student" by estimating the parameters of the added layer with limited samples. We prove that the added layer can be absorbed/merged into the previous conv-layer to formulate a new conv-layer with the same size of parameters and computation cost as the previous one. Experiments verify that the proposed method is very efficient and effective to distill knowledge from teacher-net to student-net constructing in different ways on various datasets.

READ FULL TEXT
research
05/18/2018

Recurrent knowledge distillation

Knowledge distillation compacts deep networks by letting a small student...
research
02/01/2022

Local Feature Matching with Transformers for low-end devices

LoFTR arXiv:2104.00680 is an efficient deep learning method for finding ...
research
01/07/2022

Compressing Models with Few Samples: Mimicking then Replacing

Few-sample compression aims to compress a big redundant model into a sma...
research
02/22/2023

Debiased Distillation by Transplanting the Last Layer

Deep models are susceptible to learning spurious correlations, even duri...
research
07/03/2023

Review helps learn better: Temporal Supervised Knowledge Distillation

Reviewing plays an important role when learning knowledge. The knowledge...
research
03/18/2023

Crowd Counting with Online Knowledge Learning

Efficient crowd counting models are urgently required for the applicatio...
research
11/17/2022

D^3ETR: Decoder Distillation for Detection Transformer

While various knowledge distillation (KD) methods in CNN-based detectors...

Please sign up or login with your details

Forgot password? Click here to reset