Better Supervisory Signals by Observing Learning Paths

03/04/2022
by   Yi Ren, et al.
0

Better-supervised models might have better performance. In this paper, we first clarify what makes for good supervision for a classification problem, and then explain two existing label refining methods, label smoothing and knowledge distillation, in terms of our proposed criterion. To further answer why and how better supervision emerges, we observe the learning path, i.e., the trajectory of the model's predictions during training, for each training sample. We find that the model can spontaneously refine "bad" labels through a "zig-zag" learning path, which occurs on both toy and real datasets. Observing the learning path not only provides a new perspective for understanding knowledge distillation, overfitting, and learning dynamics, but also reveals that the supervisory signal of a teacher network can be very unstable near the best points in training on real tasks. Inspired by this, we propose a new knowledge distillation scheme, Filter-KD, which improves downstream classification performance in various settings.

READ FULL TEXT

page 18

page 19

research
09/03/2022

A Novel Self-Knowledge Distillation Approach with Siamese Representation Learning for Action Recognition

Knowledge distillation is an effective transfer of knowledge from a heav...
research
03/25/2020

Circumventing Outliers of AutoAugment with Knowledge Distillation

AutoAugment has been a powerful algorithm that improves the accuracy of ...
research
12/04/2021

KDCTime: Knowledge Distillation with Calibration on InceptionTime for Time-series Classification

Time-series classification approaches based on deep neural networks are ...
research
04/19/2019

Knowledge Distillation via Route Constrained Optimization

Distillation-based learning boosts the performance of the miniaturized n...
research
11/18/2019

Preparing Lessons: Improve Knowledge Distillation with Better Supervision

Knowledge distillation (KD) is widely used for training a compact model ...
research
12/02/2021

A Fast Knowledge Distillation Framework for Visual Recognition

While Knowledge Distillation (KD) has been recognized as a useful tool i...
research
04/16/2020

Knowledge Distillation for Action Anticipation via Label Smoothing

Human capability to anticipate near future from visual observations and ...

Please sign up or login with your details

Forgot password? Click here to reset