Class-aware Information for Logit-based Knowledge Distillation

11/27/2022
by   Shuoxi Zhang, et al.
0

Knowledge distillation aims to transfer knowledge to the student model by utilizing the predictions/features of the teacher model, and feature-based distillation has recently shown its superiority over logit-based distillation. However, due to the cumbersome computation and storage of extra feature transformation, the training overhead of feature-based methods is much higher than that of logit-based distillation. In this work, we revisit the logit-based knowledge distillation, and observe that the existing logit-based distillation methods treat the prediction logits only in the instance level, while many other useful semantic information is overlooked. To address this issue, we propose a Class-aware Logit Knowledge Distillation (CLKD) method, that extents the logit distillation in both instance-level and class-level. CLKD enables the student model mimic higher semantic information from the teacher model, hence improving the distillation performance. We further introduce a novel loss called Class Correlation Loss to force the student learn the inherent class-level correlation of the teacher. Empirical comparisons demonstrate the superiority of the proposed method over several prevailing logit-based methods and feature-based methods, in which CLKD achieves compelling results on various visual classification tasks and outperforms the state-of-the-art baselines.

READ FULL TEXT
research
06/20/2023

Knowledge Distillation via Token-level Relationship Graph

Knowledge distillation is a powerful technique for transferring knowledg...
research
08/18/2022

Mind the Gap in Distilling StyleGANs

StyleGAN family is one of the most popular Generative Adversarial Networ...
research
08/01/2023

NormKD: Normalized Logits for Knowledge Distillation

Logit based knowledge distillation gets less attention in recent years s...
research
09/22/2022

DRKF: Distilled Rotated Kernel Fusion for Efficiently Boosting Rotation Invariance in Image Matching

Most existing learning-based image matching pipelines are designed for b...
research
04/03/2019

Correlation Congruence for Knowledge Distillation

Most teacher-student frameworks based on knowledge distillation (KD) dep...
research
02/08/2022

Exploring Inter-Channel Correlation for Diversity-preserved KnowledgeDistillation

Knowledge Distillation has shown very promising abil-ity in transferring...
research
11/18/2019

Preparing Lessons: Improve Knowledge Distillation with Better Supervision

Knowledge distillation (KD) is widely used for training a compact model ...

Please sign up or login with your details

Forgot password? Click here to reset