Distilling a Powerful Student Model via Online Knowledge Distillation

03/26/2021
by   Shaojie Li, et al.
0

Existing online knowledge distillation approaches either adopt the student with the best performance or construct an ensemble model for better holistic performance. However, the former strategy ignores other students' information, while the latter increases the computational complexity. In this paper, we propose a novel method for online knowledge distillation, termed FFSD, which comprises two key components: Feature Fusion and Self-Distillation, towards solving the above problems in a unified framework. Different from previous works, where all students are treated equally, the proposed FFSD splits them into a student leader and a common student set. Then, the feature fusion module converts the concatenation of feature maps from all common students into a fused feature map. The fused representation is used to assist the learning of the student leader. To enable the student leader to absorb more diverse information, we design an enhancement strategy to increase the diversity among students. Besides, a self-distillation module is adopted to convert the feature map of deeper layers into a shallower one. Then, the shallower layers are encouraged to mimic the transformed feature maps of the deeper layers, which helps the students to generalize better. After training, we simply adopt the student leader, which achieves superior performance, over the common students, without increasing the storage or inference cost. Extensive experiments on CIFAR-100 and ImageNet demonstrate the superiority of our FFSD over existing works. The code is available at https://github.com/SJLeo/FFSD.

READ FULL TEXT

page 2

page 3

page 8

research
06/16/2022

Multi scale Feature Extraction and Fusion for Online Knowledge Distillation

Online knowledge distillation conducts knowledge transfer among all stud...
research
04/19/2019

Feature Fusion for Online Mutual Knowledge Distillation

We propose a learning framework named Feature Fusion Learning (FFL) that...
research
10/02/2020

Online Knowledge Distillation via Multi-branch Diversity Enhancement

Knowledge distillation is an effective method to transfer the knowledge ...
research
02/21/2020

Residual Knowledge Distillation

Knowledge distillation (KD) is one of the most potent ways for model com...
research
02/08/2022

Exploring Inter-Channel Correlation for Diversity-preserved KnowledgeDistillation

Knowledge Distillation has shown very promising abil-ity in transferring...
research
05/18/2022

[Re] Distilling Knowledge via Knowledge Review

This effort aims to reproduce the results of experiments and analyze the...
research
08/27/2020

MetaDistiller: Network Self-Boosting via Meta-Learned Top-Down Distillation

Knowledge Distillation (KD) has been one of the most popu-lar methods to...

Please sign up or login with your details

Forgot password? Click here to reset