Robust Distillation for Worst-class Performance

06/13/2022
by   Serena Wang, et al.
12

Knowledge distillation has proven to be an effective technique in improving the performance a student model using predictions from a teacher model. However, recent work has shown that gains in average efficiency are not uniform across subgroups in the data, and in particular can often come at the cost of accuracy on rare subgroups and classes. To preserve strong performance across classes that may follow a long-tailed distribution, we develop distillation techniques that are tailored to improve the student's worst-class performance. Specifically, we introduce robust optimization objectives in different combinations for the teacher and student, and further allow for training with any tradeoff between the overall accuracy and the robust worst-class objective. We show empirically that our robust distillation techniques not only achieve better worst-class performance, but also lead to Pareto improvement in the tradeoff between overall performance and worst-class performance compared to other baseline methods. Theoretically, we provide insights into what makes a good teacher when the goal is to train a robust student.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/19/2021

Teacher's pet: understanding and mitigating biases in distillation

Knowledge distillation is widely used as a means of improving the perfor...
research
05/21/2020

Why distillation helps: a statistical perspective

Knowledge distillation is a technique for improving the performance of a...
research
05/26/2023

Improving Knowledge Distillation via Regularizing Feature Norm and Direction

Knowledge distillation (KD) exploits a large well-trained model (i.e., t...
research
12/19/2021

Controlling the Quality of Distillation in Response-Based Network Compression

The performance of a distillation-based compressed network is governed b...
research
02/23/2022

Multi-Teacher Knowledge Distillation for Incremental Implicitly-Refined Classification

Incremental learning methods can learn new classes continually by distil...
research
05/24/2020

Joint learning of interpretation and distillation

The extra trust brought by the model interpretation has made it an indis...
research
09/19/2022

Toward Understanding Privileged Features Distillation in Learning-to-Rank

In learning-to-rank problems, a privileged feature is one that is availa...

Please sign up or login with your details

Forgot password? Click here to reset