Self-Feature Regularization: Self-Feature Distillation Without Teacher Models

03/12/2021
by   Wenxuan Fan, et al.
0

Knowledge distillation is the process of transferring the knowledge from a large model to a small model. In this process, the small model learns the generalization ability of the large model and retains the performance close to that of the large model. Knowledge distillation provides a training means to migrate the knowledge of models, facilitating model deployment and speeding up inference. However, previous distillation methods require pre-trained teacher models, which still bring computational and storage overheads. In this paper, a novel general training framework called Self-Feature Regularization (SFR) is proposed, which uses features in the deep layers to supervise feature learning in the shallow layers, retains more semantic information. Specifically, we firstly use EMD-l2 loss to match local features and a many-to-one approach to distill features more intensively in the channel dimension. Then dynamic label smoothing is used in the output layer to achieve better performance. Experiments further show the effectiveness of our proposed framework.

READ FULL TEXT

page 2

page 4

research
02/16/2022

Deeply-Supervised Knowledge Distillation

Knowledge distillation aims to enhance the performance of a lightweight ...
research
03/15/2021

Refine Myself by Teaching Myself: Feature Refinement via Self-Knowledge Distillation

Knowledge distillation is a method of transferring the knowledge from a ...
research
09/11/2020

Extending Label Smoothing Regularization with Self-Knowledge Distillation

Inspired by the strong correlation between the Label Smoothing Regulariz...
research
04/27/2023

Self-discipline on multiple channels

Self-distillation relies on its own information to improve the generaliz...
research
03/02/2023

Learning From Yourself: A Self-Distillation Method for Fake Speech Detection

In this paper, we propose a novel self-distillation method for fake spee...
research
02/13/2020

Self-Distillation Amplifies Regularization in Hilbert Space

Knowledge distillation introduced in the deep learning context is a meth...
research
04/05/2023

Self-Distillation for Gaussian Process Regression and Classification

We propose two approaches to extend the notion of knowledge distillation...

Please sign up or login with your details

Forgot password? Click here to reset