DGD: Densifying the Knowledge of Neural Networks with Filter Grafting and Knowledge Distillation

04/26/2020
by   Hao Cheng, et al.
0

With a fixed model structure, knowledge distillation and filter grafting are two effective ways to boost single model accuracy. However, the working mechanism and the differences between distillation and grafting have not been fully unveiled. In this paper, we evaluate the effect of distillation and grafting in the filter level, and find that the impacts of the two techniques are surprisingly complementary: distillation mostly enhances the knowledge of valid filters while grafting mostly reactivates invalid filters. This observation guides us to design a unified training framework called DGD, where distillation and grafting are naturally combined to increase the knowledge density inside the filters given a fixed model structure. Through extensive experiments, we show that the knowledge densified network in DGD shares both advantages of distillation and grafting, lifting the model accuracy to a higher level.

READ FULL TEXT

page 2

page 3

page 5

page 6

page 7

page 8

page 9

page 10

research
08/22/2022

Tree-structured Auxiliary Online Knowledge Distillation

Traditional knowledge distillation adopts a two-stage training process i...
research
07/26/2021

Text is Text, No Matter What: Unifying Text Recognition using Knowledge Distillation

Text recognition remains a fundamental and extensively researched topic ...
research
06/02/2021

Not All Knowledge Is Created Equal

Mutual knowledge distillation (MKD) improves a model by distilling knowl...
research
12/01/2021

Extrapolating from a Single Image to a Thousand Classes using Distillation

What can neural networks learn about the visual world from a single imag...
research
04/20/2021

EduPal leaves no professor behind: Supporting faculty via a peer-powered recommender system

The swift transitions in higher education after the COVID-19 outbreak id...
research
09/30/2022

Towards a Unified View of Affinity-Based Knowledge Distillation

Knowledge transfer between artificial neural networks has become an impo...
research
03/14/2023

MetaMixer: A Regularization Strategy for Online Knowledge Distillation

Online knowledge distillation (KD) has received increasing attention in ...

Please sign up or login with your details

Forgot password? Click here to reset