Learning Interpretation with Explainable Knowledge Distillation

11/12/2021
by   Raed Alharbi, et al.
10

Knowledge Distillation (KD) has been considered as a key solution in model compression and acceleration in recent years. In KD, a small student model is generally trained from a large teacher model by minimizing the divergence between the probabilistic outputs of the two. However, as demonstrated in our experiments, existing KD methods might not transfer critical explainable knowledge of the teacher to the student, i.e. the explanations of predictions made by the two models are not consistent. In this paper, we propose a novel explainable knowledge distillation model, called XDistillation, through which both the performance the explanations' information are transferred from the teacher model to the student model. The XDistillation model leverages the idea of convolutional autoencoders to approximate the teacher explanations. Our experiments shows that models trained by XDistillation outperform those trained by conventional KD methods not only in term of predictive accuracy but also faithfulness to the teacher models.

READ FULL TEXT

page 1

page 3

page 4

page 8

page 9

research
04/24/2023

Improving Knowledge Distillation Via Transferring Learning Ability

Existing knowledge distillation methods generally use a teacher-student ...
research
06/17/2022

Revisiting Self-Distillation

Knowledge distillation is the procedure of transferring "knowledge" from...
research
09/19/2020

Introspective Learning by Distilling Knowledge from Online Self-explanation

In recent years, many explanation methods have been proposed to explain ...
research
03/16/2023

Knowledge Distillation for Adaptive MRI Prostate Segmentation Based on Limit-Trained Multi-Teacher Models

With numerous medical tasks, the performance of deep models has recently...
research
04/13/2020

Knowledge Distillation and Student-Teacher Learning for Visual Intelligence: A Review and New Outlooks

Deep neural models in recent years have been successful in almost every ...
research
08/03/2023

Baby Llama: knowledge distillation from an ensemble of teachers trained on a small dataset with no performance penalty

We present our proposed solution to the BabyLM challenge [arXiv:2301.117...
research
07/27/2023

f-Divergence Minimization for Sequence-Level Knowledge Distillation

Knowledge distillation (KD) is the process of transferring knowledge fro...

Please sign up or login with your details

Forgot password? Click here to reset