Subclass Distillation

02/10/2020
by   Rafael Müller, et al.
9

After a large "teacher" neural network has been trained on labeled data, the probabilities that the teacher assigns to incorrect classes reveal a lot of information about the way in which the teacher generalizes. By training a small "student" model to match these probabilities, it is possible to transfer most of the generalization ability of the teacher to the student, often producing a much better small model than directly training the student on the training data. The transfer works best when there are many possible classes because more is then revealed about the function learned by the teacher, but in cases where there are only a few possible classes we show that we can improve the transfer by forcing the teacher to divide each class into many subclasses that it invents during the supervised training. The student is then trained to match the subclass probabilities. For datasets where there are known, natural subclasses we demonstrate that the teacher learns similar subclasses and these improve distillation. For clickthrough datasets where the subclasses are unknown we demonstrate that subclass distillation allows the student to learn faster and better.

READ FULL TEXT

page 5

page 7

page 8

research
11/07/2017

Moonshine: Distilling with Cheap Convolutions

Model distillation compresses a trained machine learning model, such as ...
research
02/13/2021

Distilling Double Descent

Distillation is the technique of training a "student" model based on exa...
research
07/22/2020

Leveraging Undiagnosed Data for Glaucoma Classification with Teacher-Student Learning

Recently, deep learning has been adopted to the glaucoma classification ...
research
03/23/2020

Neural Networks and Polynomial Regression. Demystifying the Overparametrization Phenomena

In the context of neural network models, overparametrization refers to t...
research
03/06/2019

Learning from Higher-Layer Feature Visualizations

Driven by the goal to enable sleep apnea monitoring and machine learning...
research
01/29/2023

Pipe-BD: Pipelined Parallel Blockwise Distillation

Training large deep neural network models is highly challenging due to t...
research
06/01/2017

Deep Mutual Learning

Model distillation is an effective and widely used technique to transfer...

Please sign up or login with your details

Forgot password? Click here to reset