Why distillation helps: a statistical perspective

05/21/2020
by   Aditya Krishna Menon, et al.
41

Knowledge distillation is a technique for improving the performance of a simple "student" model by replacing its one-hot training labels with a distribution over labels obtained from a complex "teacher" model. While this simple approach has proven widely effective, a basic question remains unresolved: why does distillation help? In this paper, we present a statistical perspective on distillation which addresses this question, and provides a novel connection to extreme multiclass retrieval techniques. Our core observation is that the teacher seeks to estimate the underlying (Bayes) class-probability function. Building on this, we establish a fundamental bias-variance tradeoff in the student's objective: this quantifies how approximate knowledge of these class-probabilities can significantly aid learning. Finally, we show how distillation complements existing negative mining techniques for extreme multiclass retrieval, and propose a unified objective which combines these ideas.

READ FULL TEXT
research
04/20/2021

Knowledge Distillation as Semiparametric Inference

A popular approach to model compression is to train an inexpensive stude...
research
06/19/2021

Teacher's pet: understanding and mitigating biases in distillation

Knowledge distillation is widely used as a means of improving the perfor...
research
06/13/2022

Robust Distillation for Worst-class Performance

Knowledge distillation has proven to be an effective technique in improv...
research
02/01/2021

Rethinking Soft Labels for Knowledge Distillation: A Bias-Variance Tradeoff Perspective

Knowledge distillation is an effective approach to leverage a well-train...
research
09/07/2023

Towards Comparable Knowledge Distillation in Semantic Image Segmentation

Knowledge Distillation (KD) is one proposed solution to large model size...
research
05/06/2022

Collective Relevance Labeling for Passage Retrieval

Deep learning for Information Retrieval (IR) requires a large amount of ...
research
06/14/2019

Effectiveness of Distillation Attack and Countermeasure on Neural Network Watermarking

The rise of machine learning as a service and model sharing platforms ha...

Please sign up or login with your details

Forgot password? Click here to reset