Debiased Distillation by Transplanting the Last Layer

02/22/2023
by   Jiwoon Lee, et al.
0

Deep models are susceptible to learning spurious correlations, even during the post-processing. We take a closer look at the knowledge distillation – a popular post-processing technique for model compression – and find that distilling with biased training data gives rise to a biased student, even when the teacher is debiased. To address this issue, we propose a simple knowledge distillation algorithm, coined DeTT (Debiasing by Teacher Transplanting). Inspired by a recent observation that the last neural net layer plays an overwhelmingly important role in debiasing, DeTT directly transplants the teacher's last layer to the student. Remaining layers are distilled by matching the feature map outputs of the student and the teacher, where the samples are reweighted to mitigate the dataset bias. Importantly, DeTT does not rely on the availability of extensive annotations on the bias-related attribute, which is typically not available during the post-processing phase. Throughout our experiments, DeTT successfully debiases the student model, consistently outperforming the baselines in terms of the worst-group accuracy.

READ FULL TEXT
research
02/16/2022

Deeply-Supervised Knowledge Distillation

Knowledge distillation aims to enhance the performance of a lightweight ...
research
12/05/2018

Knowledge Distillation from Few Samples

Current knowledge distillation methods require full training data to dis...
research
11/27/2022

Unbiased Knowledge Distillation for Recommendation

As a promising solution for model compression, knowledge distillation (K...
research
11/03/2020

In Defense of Feature Mimicking for Knowledge Distillation

Knowledge distillation (KD) is a popular method to train efficient netwo...
research
11/17/2022

DETRDistill: A Universal Knowledge Distillation Framework for DETR-families

Transformer-based detectors (DETRs) have attracted great attention due t...
research
07/11/2019

Privileged Features Distillation for E-Commerce Recommendations

Features play an important role in most prediction tasks of e-commerce r...
research
04/04/2023

Label-guided Attention Distillation for Lane Segmentation

Contemporary segmentation methods are usually based on deep fully convol...

Please sign up or login with your details

Forgot password? Click here to reset