Faculty Distillation with Optimal Transport

04/25/2022
by   Su Lu, et al.
0

Knowledge distillation (KD) has shown its effectiveness in improving a student classifier given a suitable teacher. The outpouring of diverse and plentiful pre-trained models may provide abundant teacher resources for KD. However, these models are often trained on different tasks from the student, which requires the student to precisely select the most contributive teacher and enable KD across different label spaces. These restrictions disclose the insufficiency of standard KD and motivate us to study a new paradigm called faculty distillation. Given a group of teachers (faculty), a student needs to select the most relevant teacher and perform generalized knowledge reuse. To this end, we propose to link teacher's task and student's task by optimal transport. Based on the semantic relationship between their label spaces, we can bridge the support gap between output distributions by minimizing Sinkhorn distances. The transportation cost also acts as a measurement of teachers' adaptability so that we can rank the teachers efficiently according to their relatedness. Experiments under various settings demonstrate the succinctness and versatility of our method.

READ FULL TEXT

page 5

page 6

page 15

research
05/04/2022

Generalized Knowledge Distillation via Relationship Matching

The knowledge of a well-trained deep neural network (a.k.a. the "teacher...
research
12/07/2020

Model Compression Using Optimal Transport

Model compression methods are important to allow for easier deployment o...
research
04/04/2023

Optimal Transport for Correctional Learning

The contribution of this paper is a generalized formulation of correctio...
research
05/29/2022

AutoDisc: Automatic Distillation Schedule for Large Language Model Compression

Driven by the teacher-student paradigm, knowledge distillation is one of...
research
10/13/2021

Language Modelling via Learning to Rank

We consider language modelling (LM) as a multi-label structured predicti...
research
05/30/2022

Knowledge Distillation for 6D Pose Estimation by Keypoint Distribution Alignment

Knowledge distillation facilitates the training of a compact student net...
research
05/21/2022

Mapping Emulation for Knowledge Distillation

This paper formalizes the source-blind knowledge distillation problem th...

Please sign up or login with your details

Forgot password? Click here to reset