Triplet Knowledge Distillation

05/25/2023
by   Xijun Wang, et al.
0

In Knowledge Distillation, the teacher is generally much larger than the student, making the solution of the teacher likely to be difficult for the student to learn. To ease the mimicking difficulty, we introduce a triplet knowledge distillation mechanism named TriKD. Besides teacher and student, TriKD employs a third role called anchor model. Before distillation begins, the pre-trained anchor model delimits a subspace within the full solution space of the target problem. Solutions within the subspace are expected to be easy targets that the student could mimic well. Distillation then begins in an online manner, and the teacher is only allowed to express solutions within the aforementioned subspace. Surprisingly, benefiting from accurate but easy-to-mimic hints, the student can finally perform well. After the student is well trained, it can be used as the new anchor for new students, forming a curriculum learning strategy. Our experiments on image classification and face recognition with various models clearly demonstrate the effectiveness of our method. Furthermore, the proposed TriKD is also effective in dealing with the overfitting issue. Moreover, our theoretical analysis supports the rationality of our triplet distillation.

READ FULL TEXT

page 2

page 10

research
03/23/2021

Student Network Learning via Evolutionary Knowledge Distillation

Knowledge distillation provides an effective way to transfer knowledge v...
research
04/20/2021

Knowledge Distillation as Semiparametric Inference

A popular approach to model compression is to train an inexpensive stude...
research
08/03/2020

Teacher-Student Training and Triplet Loss for Facial Expression Recognition under Occlusion

In this paper, we study the task of facial expression recognition under ...
research
09/23/2022

Descriptor Distillation: a Teacher-Student-Regularized Framework for Learning Local Descriptors

Learning a fast and discriminative patch descriptor is a challenging top...
research
11/20/2021

Teacher-Student Training and Triplet Loss to Reduce the Effect of Drastic Face Occlusion

We study a series of recognition tasks in two realistic scenarios requir...
research
11/22/2022

A Generic Approach for Reproducible Model Distillation

Model distillation has been a popular method for producing interpretable...
research
12/01/2019

Online Knowledge Distillation with Diverse Peers

Distillation is an effective knowledge-transfer technique that uses pred...

Please sign up or login with your details

Forgot password? Click here to reset