Tree-structured Auxiliary Online Knowledge Distillation

08/22/2022
by   Wenye Lin, et al.
0

Traditional knowledge distillation adopts a two-stage training process in which a teacher model is pre-trained and then transfers the knowledge to a compact student model. To overcome the limitation, online knowledge distillation is proposed to perform one-stage distillation when the teacher is unavailable. Recent researches on online knowledge distillation mainly focus on the design of the distillation objective, including attention or gate mechanism. Instead, in this work, we focus on the design of the global architecture and propose Tree-Structured Auxiliary online knowledge distillation (TSA), which adds more parallel peers for layers close to the output hierarchically to strengthen the effect of knowledge distillation. Different branches construct different views of the inputs, which can be the source of the knowledge. The hierarchical structure implies that the knowledge transfers from general to task-specific with the growth of the layers. Extensive experiments on 3 computer vision and 4 natural language processing datasets show that our method achieves state-of-the-art performance without bells and whistles. To the best of our knowledge, we are the first to demonstrate the effectiveness of online knowledge distillation for machine translation tasks.

READ FULL TEXT

page 1

page 2

page 3

page 4

page 5

page 6

page 7

page 8

research
11/23/2021

Semi-Online Knowledge Distillation

Knowledge distillation is an effective and stable method for model compr...
research
04/26/2020

DGD: Densifying the Knowledge of Neural Networks with Filter Grafting and Knowledge Distillation

With a fixed model structure, knowledge distillation and filter grafting...
research
03/14/2023

MetaMixer: A Regularization Strategy for Online Knowledge Distillation

Online knowledge distillation (KD) has received increasing attention in ...
research
11/11/2022

FAN-Trans: Online Knowledge Distillation for Facial Action Unit Detection

Due to its importance in facial behaviour analysis, facial action unit (...
research
06/09/2021

Knowledge distillation: A good teacher is patient and consistent

There is a growing discrepancy in computer vision between large-scale mo...
research
08/10/2023

Towards General and Fast Video Derain via Knowledge Distillation

As a common natural weather condition, rain can obscure video frames and...
research
11/11/2020

Distill2Vec: Dynamic Graph Representation Learning with Knowledge Distillation

Dynamic graph representation learning strategies are based on different ...

Please sign up or login with your details

Forgot password? Click here to reset