Knowledge Representing: Efficient, Sparse Representation of Prior Knowledge for Knowledge Distillation

11/13/2019
by   Junjie Liu, et al.
0

Despite the recent works on knowledge distillation (KD) have achieved a further improvement through elaborately modeling the decision boundary as the posterior knowledge, their performance is still dependent on the hypothesis that the target network has a powerful capacity (representation ability). In this paper, we propose a knowledge representing (KR) framework mainly focusing on modeling the parameters distribution as prior knowledge. Firstly, we suggest a knowledge aggregation scheme in order to answer how to represent the prior knowledge from teacher network. Through aggregating the parameters distribution from teacher network into more abstract level, the scheme is able to alleviate the phenomenon of residual accumulation in the deeper layers. Secondly, as the critical issue of what the most important prior knowledge is for better distilling, we design a sparse recoding penalty for constraining the student network to learn with the penalized gradients. With the proposed penalty, the student network can effectively avoid the over-regularization during knowledge distilling and converge faster. The quantitative experiments exhibit that the proposed framework achieves the state-ofthe-arts performance, even though the target network does not have the expected capacity. Moreover, the framework is flexible enough for combining with other KD methods based on the posterior knowledge.

READ FULL TEXT
research
05/18/2018

Recurrent knowledge distillation

Knowledge distillation compacts deep networks by letting a small student...
research
06/13/2022

Better Teacher Better Student: Dynamic Prior Knowledge for Knowledge Distillation

Knowledge distillation (KD) has shown very promising capabilities in tra...
research
06/14/2022

SoTeacher: A Student-oriented Teacher Network Training Framework for Knowledge Distillation

How to train an ideal teacher for knowledge distillation is still an ope...
research
10/09/2021

Visualizing the embedding space to explain the effect of knowledge distillation

Recent research has found that knowledge distillation can be effective i...
research
09/04/2023

Prior Knowledge Guided Network for Video Anomaly Detection

Video Anomaly Detection (VAD) involves detecting anomalous events in vid...
research
05/15/2018

Knowledge Distillation with Adversarial Samples Supporting Decision Boundary

Many recent works on knowledge distillation have provided ways to transf...
research
06/16/2020

Prior knowledge distillation based on financial time series

One of the major characteristics of financial time series is that they c...

Please sign up or login with your details

Forgot password? Click here to reset