Efficient One Pass Self-distillation with Zipf's Label Smoothing

07/26/2022
by   Jiajun Liang, et al.
0

Self-distillation exploits non-uniform soft supervision from itself during training and improves performance without any runtime cost. However, the overhead during training is often overlooked, and yet reducing time and memory overhead during training is increasingly important in the giant models' era. This paper proposes an efficient self-distillation method named Zipf's Label Smoothing (Zipf's LS), which uses the on-the-fly prediction of a network to generate soft supervision that conforms to Zipf distribution without using any contrastive samples or auxiliary parameters. Our idea comes from an empirical observation that when the network is duly trained the output values of a network's final softmax layer, after sorting by the magnitude and averaged across samples, should follow a distribution reminiscent to Zipf's Law in the word frequency statistics of natural languages. By enforcing this property on the sample level and throughout the whole training period, we find that the prediction accuracy can be greatly improved. Using ResNet50 on the INAT21 fine-grained classification dataset, our technique achieves +3.61 gain compared to the vanilla baseline, and 0.88 label smoothing or self-distillation strategies. The implementation is publicly available at https://github.com/megvii-research/zipfls.

READ FULL TEXT
research
12/02/2021

A Fast Knowledge Distillation Framework for Visual Recognition

While Knowledge Distillation (KD) has been recognized as a useful tool i...
research
04/27/2023

Self-discipline on multiple channels

Self-distillation relies on its own information to improve the generaliz...
research
11/25/2020

Delving Deep into Label Smoothing

Label smoothing is an effective regularization tool for deep neural netw...
research
10/22/2022

Adaptive Label Smoothing with Self-Knowledge in Natural Language Generation

Overconfidence has been shown to impair generalization and calibration o...
research
09/17/2020

MEAL V2: Boosting Vanilla ResNet-50 to 80 without Tricks

In this paper, we introduce a simple yet effective approach that can boo...
research
06/09/2020

Self-Distillation as Instance-Specific Label Smoothing

It has been recently demonstrated that multi-generational self-distillat...
research
12/26/2022

SMMix: Self-Motivated Image Mixing for Vision Transformers

CutMix is a vital augmentation strategy that determines the performance ...

Please sign up or login with your details

Forgot password? Click here to reset