Knowledge Distillation: Bad Models Can Be Good Role Models

03/28/2022
by   Gal Kaplun, et al.
0

Large neural networks trained in the overparameterized regime are able to fit noise to zero train error. Recent work <cit.> has empirically observed that such networks behave as "conditional samplers" from the noisy distribution. That is, they replicate the noise in the train data to unseen examples. We give a theoretical framework for studying this conditional sampling behavior in the context of learning theory. We relate the notion of such samplers to knowledge distillation, where a student network imitates the outputs of a teacher on unlabeled data. We show that samplers, while being bad classifiers, can be good teachers. Concretely, we prove that distillation from samplers is guaranteed to produce a student which approximates the Bayes optimal classifier. Finally, we show that some common learning algorithms (e.g., Nearest-Neighbours and Kernel Machines) can generate samplers when applied in the overparameterized regime.

READ FULL TEXT
research
03/30/2020

On the Unreasonable Effectiveness of Knowledge Distillation: Analysis in the Kernel Regime

Knowledge distillation (KD), i.e. one classifier being trained on the ou...
research
04/19/2023

Knowledge Distillation Under Ideal Joint Classifier Assumption

Knowledge distillation is a powerful technique to compress large neural ...
research
11/13/2019

Learning from a Teacher using Unlabeled Data

Knowledge distillation is a widely used technique for model compression....
research
05/27/2021

Towards Understanding Knowledge Distillation

Knowledge distillation, i.e., one classifier being trained on the output...
research
08/20/2022

Effectiveness of Function Matching in Driving Scene Recognition

Knowledge distillation is an effective approach for training compact rec...
research
06/28/2023

On information captured by neural networks: connections with memorization and generalization

Despite the popularity and success of deep learning, there is limited un...
research
03/31/2022

Conditional Autoregressors are Interpretable Classifiers

We explore the use of class-conditional autoregressive (CA) models to pe...

Please sign up or login with your details

Forgot password? Click here to reset