Knowledge Distillation via Route Constrained Optimization

04/19/2019
by   Xiao Jin, et al.
8

Distillation-based learning boosts the performance of the miniaturized neural network based on the hypothesis that the representation of a teacher model can be used as structured and relatively weak supervision, and thus would be easily learned by a miniaturized model. However, we find that the representation of a converged heavy model is still a strong constraint for training a small student model, which leads to a high lower bound of congruence loss. In this work, inspired by curriculum learning we consider the knowledge distillation from the perspective of curriculum learning by routing. Instead of supervising the student model with a converged teacher model, we supervised it with some anchor points selected from the route in parameter space that the teacher model passed by, as we called route constrained optimization (RCO). We experimentally demonstrate this simple operation greatly reduces the lower bound of congruence loss for knowledge distillation, hint and mimicking learning. On close-set classification tasks like CIFAR100 and ImageNet, RCO improves knowledge distillation by 2.14 generalization, we also test RCO on the open-set face recognition task MegaFace.

READ FULL TEXT
research
04/10/2023

A Survey on Recent Teacher-student Learning Studies

Knowledge distillation is a method of transferring the knowledge from a ...
research
09/15/2022

CES-KD: Curriculum-based Expert Selection for Guided Knowledge Distillation

Knowledge distillation (KD) is an effective tool for compressing deep cl...
research
07/17/2023

DOT: A Distillation-Oriented Trainer

Knowledge distillation transfers knowledge from a large model to a small...
research
03/19/2022

Emulating Quantum Dynamics with Neural Networks via Knowledge Distillation

High-fidelity quantum dynamics emulators can be used to predict the time...
research
07/20/2021

Follow Your Path: a Progressive Method for Knowledge Distillation

Deep neural networks often have a huge number of parameters, which posts...
research
03/04/2022

Better Supervisory Signals by Observing Learning Paths

Better-supervised models might have better performance. In this paper, w...
research
08/03/2020

Teacher-Student Training and Triplet Loss for Facial Expression Recognition under Occlusion

In this paper, we study the task of facial expression recognition under ...

Please sign up or login with your details

Forgot password? Click here to reset