Teacher's pet: understanding and mitigating biases in distillation

06/19/2021
by   Michal Lukasik, et al.
5

Knowledge distillation is widely used as a means of improving the performance of a relatively simple student model using the predictions from a complex teacher model. Several works have shown that distillation significantly boosts the student's overall performance; however, are these gains uniform across all data subgroups? In this paper, we show that distillation can harm performance on certain subgroups, e.g., classes with few associated samples. We trace this behaviour to errors made by the teacher distribution being transferred to and amplified by the student model. To mitigate this problem, we present techniques which soften the teacher influence for subgroups where it is less reliable. Experiments on several image classification benchmarks show that these modifications of distillation maintain boost in overall accuracy, while additionally ensuring improvement in subgroup performance.

READ FULL TEXT

page 14

page 17

research
06/13/2022

Robust Distillation for Worst-class Performance

Knowledge distillation has proven to be an effective technique in improv...
research
06/30/2020

Extracurricular Learning: Knowledge Transfer Beyond Empirical Distribution

Knowledge distillation has been used to transfer knowledge learned by a ...
research
01/28/2023

Supervision Complexity and its Role in Knowledge Distillation

Despite the popularity and efficacy of knowledge distillation, there is ...
research
05/21/2020

Why distillation helps: a statistical perspective

Knowledge distillation is a technique for improving the performance of a...
research
05/03/2023

SCOTT: Self-Consistent Chain-of-Thought Distillation

Large language models (LMs) beyond a certain scale, demonstrate the emer...
research
09/26/2019

Two-stage Image Classification Supervised by a Single Teacher Single Student Model

The two-stage strategy has been widely used in image classification. How...
research
06/04/2021

Churn Reduction via Distillation

In real-world systems, models are frequently updated as more data become...

Please sign up or login with your details

Forgot password? Click here to reset