Channel Self-Supervision for Online Knowledge Distillation

03/22/2022
by   Shixiao Fan, et al.
0

Recently, researchers have shown an increased interest in the online knowledge distillation. Adopting an one-stage and end-to-end training fashion, online knowledge distillation uses aggregated intermediated predictions of multiple peer models for training. However, the absence of a powerful teacher model may result in the homogeneity problem between group peers, affecting the effectiveness of group distillation adversely. In this paper, we propose a novel online knowledge distillation method, Channel Self-Supervision for Online Knowledge Distillation (CSS), which structures diversity in terms of input, target, and network to alleviate the homogenization problem. Specifically, we construct a dual-network multi-branch structure and enhance inter-branch diversity through self-supervised learning, adopting the feature-level transformation and augmenting the corresponding labels. Meanwhile, the dual network structure has a larger space of independent parameters to resist the homogenization problem during distillation. Extensive quantitative experiments on CIFAR-100 illustrate that our method provides greater diversity than OKDDip and we also give pretty performance improvement, even over the state-of-the-art such as PCL. The results on three fine-grained datasets (StanfordDogs, StanfordCars, CUB-200-211) also show the significant generalization capability of our approach.

READ FULL TEXT
research
06/07/2020

Peer Collaborative Learning for Online Knowledge Distillation

Traditional knowledge distillation uses a two-stage training strategy to...
research
03/21/2023

Heterogeneous-Branch Collaborative Learning for Dialogue Generation

With the development of deep learning, advanced dialogue generation meth...
research
03/13/2022

CEKD:Cross Ensemble Knowledge Distillation for Augmented Fine-grained Data

Data augmentation has been proved effective in training deep models. Exi...
research
03/14/2023

MetaMixer: A Regularization Strategy for Online Knowledge Distillation

Online knowledge distillation (KD) has received increasing attention in ...
research
08/26/2021

Efficient training of lightweight neural networks using Online Self-Acquired Knowledge Distillation

Knowledge Distillation has been established as a highly promising approa...
research
03/07/2022

Enhance Language Identification using Dual-mode Model with Knowledge Distillation

In this paper, we propose to employ a dual-mode framework on the x-vecto...
research
08/26/2023

Boosting Residual Networks with Group Knowledge

Recent research understands the residual networks from a new perspective...

Please sign up or login with your details

Forgot password? Click here to reset