Meta Learning for Knowledge Distillation

06/08/2021
by   Wangchunshu Zhou, et al.
0

We present Meta Learning for Knowledge Distillation (MetaDistil), a simple yet effective alternative to traditional knowledge distillation (KD) methods where the teacher model is fixed during training. We show the teacher network can learn to better transfer knowledge to the student network (i.e., learning to teach) with the feedback from the performance of the distilled student network in a meta learning framework. Moreover, we introduce a pilot update mechanism to improve the alignment between the inner-learner and meta-learner in meta learning algorithms that focus on an improved inner-learner. Experiments on various benchmarks show that MetaDistil can yield significant improvements compared with traditional KD algorithms and is less sensitive to the choice of different student capacity and hyperparameters, facilitating the use of KD on different tasks and models. The code is available at https://github.com/JetRunner/MetaDistil

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/11/2023

Adaptive Multi-Teacher Knowledge Distillation with Meta-Learning

Multi-Teacher knowledge distillation provides students with additional s...
research
10/11/2022

Meta-Learning with Self-Improving Momentum Target

The idea of using a separately trained target model (or teacher) to impr...
research
04/19/2021

Distilling Knowledge via Knowledge Review

Knowledge distillation transfers knowledge from the teacher network to t...
research
08/15/2021

Multi-granularity for knowledge distillation

Considering the fact that students have different abilities to understan...
research
04/12/2022

DistPro: Searching A Fast Knowledge Distillation Process via Meta Optimization

Recent Knowledge distillation (KD) studies show that different manually ...
research
02/28/2023

Learning to Retain while Acquiring: Combating Distribution-Shift in Adversarial Data-Free Knowledge Distillation

Data-free Knowledge Distillation (DFKD) has gained popularity recently, ...
research
02/05/2021

Show, Attend and Distill:Knowledge Distillation via Attention-based Feature Matching

Knowledge distillation extracts general knowledge from a pre-trained tea...

Please sign up or login with your details

Forgot password? Click here to reset