Knowledge Distillation via Token-level Relationship Graph

06/20/2023
by   Shuoxi Zhang, et al.
0

Knowledge distillation is a powerful technique for transferring knowledge from a pre-trained teacher model to a student model. However, the true potential of knowledge transfer has not been fully explored. Existing approaches primarily focus on distilling individual information or instance-level relationships, overlooking the valuable information embedded in token-level relationships, which may be particularly affected by the long-tail effects. To address the above limitations, we propose a novel method called Knowledge Distillation with Token-level Relationship Graph (TRG) that leverages the token-wise relational knowledge to enhance the performance of knowledge distillation. By employing TRG, the student model can effectively emulate higher-level semantic information from the teacher model, resulting in improved distillation results. To further enhance the learning process, we introduce a token-wise contextual loss called contextual loss, which encourages the student model to capture the inner-instance semantic contextual of the teacher model. We conduct experiments to evaluate the effectiveness of the proposed method against several state-of-the-art approaches. Empirical results demonstrate the superiority of TRG across various visual classification tasks, including those involving imbalanced data. Our method consistently outperforms the existing baselines, establishing a new state-of-the-art performance in the field of knowledge distillation.

READ FULL TEXT

page 2

page 12

research
11/27/2022

Class-aware Information for Logit-based Knowledge Distillation

Knowledge distillation aims to transfer knowledge to the student model b...
research
12/01/2020

Multi-level Knowledge Distillation

Knowledge distillation has become an important technique for model compr...
research
03/18/2021

Similarity Transfer for Knowledge Distillation

Knowledge distillation is a popular paradigm for learning portable neura...
research
11/06/2021

Class Token and Knowledge Distillation for Multi-head Self-Attention Speaker Verification Systems

This paper explores three novel approaches to improve the performance of...
research
04/03/2019

Correlation Congruence for Knowledge Distillation

Most teacher-student frameworks based on knowledge distillation (KD) dep...
research
05/16/2023

Lightweight Self-Knowledge Distillation with Multi-source Information Fusion

Knowledge Distillation (KD) is a powerful technique for transferring kno...

Please sign up or login with your details

Forgot password? Click here to reset