Self-Distillation from the Last Mini-Batch for Consistency Regularization

03/30/2022
by   Yiqing Shen, et al.
0

Knowledge distillation (KD) shows a bright promise as a powerful regularization strategy to boost generalization ability by leveraging learned sample-level soft targets. Yet, employing a complex pre-trained teacher network or an ensemble of peer students in existing KD is both time-consuming and computationally costly. Various self KD methods have been proposed to achieve higher distillation efficiency. However, they either require extra network architecture modification or are difficult to parallelize. To cope with these challenges, we propose an efficient and reliable self-distillation framework, named Self-Distillation from Last Mini-Batch (DLB). Specifically, we rearrange the sequential sampling by constraining half of each mini-batch coinciding with the previous iteration. Meanwhile, the rest half will coincide with the upcoming iteration. Afterwards, the former half mini-batch distills on-the-fly soft targets generated in the previous iteration. Our proposed mechanism guides the training stability and consistency, resulting in robustness to label noise. Moreover, our method is easy to implement, without taking up extra run-time memory or requiring model structure modification. Experimental results on three classification benchmarks illustrate that our approach can consistently outperform state-of-the-art self-distillation approaches with different network architectures. Additionally, our method shows strong compatibility with augmentation strategies by gaining additional performance improvement. The code is available at https://github.com/Meta-knowledge-Lab/DLB.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/27/2023

Self-discipline on multiple channels

Self-distillation relies on its own information to improve the generaliz...
research
12/31/2022

Disjoint Masking with Joint Distillation for Efficient Masked Image Modeling

Masked image modeling (MIM) has shown great promise for self-supervised ...
research
11/14/2022

Self-distillation with Online Diffusion on Batch Manifolds Improves Deep Metric Learning

Recent deep metric learning (DML) methods typically leverage solely clas...
research
08/27/2020

MetaDistiller: Network Self-Boosting via Meta-Learned Top-Down Distillation

Knowledge Distillation (KD) has been one of the most popu-lar methods to...
research
02/01/2021

Rethinking Soft Labels for Knowledge Distillation: A Bias-Variance Tradeoff Perspective

Knowledge distillation is an effective approach to leverage a well-train...
research
02/25/2021

Even your Teacher Needs Guidance: Ground-Truth Targets Dampen Regularization Imposed by Self-Distillation

Knowledge distillation is classically a procedure where a neural network...
research
08/26/2022

Universal Mini-Batch Consistency for Set Encoding Functions

Previous works have established solid foundations for neural set functio...

Please sign up or login with your details

Forgot password? Click here to reset