Peer Collaborative Learning for Online Knowledge Distillation

06/07/2020
by   Guile Wu, et al.
0

Traditional knowledge distillation uses a two-stage training strategy to transfer knowledge from a high-capacity teacher model to a smaller student model, which relies heavily on the pre-trained teacher. Recent online knowledge distillation alleviates this limitation by collaborative learning, mutual learning and online ensembling, following a one-stage end-to-end training strategy. However, collaborative learning and mutual learning fail to construct an online high-capacity teacher, whilst online ensembling ignores the collaboration among branches and its logit summation impedes the further optimisation of the ensemble teacher. In this work, we propose a novel Peer Collaborative Learning method for online knowledge distillation. Specifically, we employ a multi-branch network (each branch is a peer) and assemble the features from peers with an additional classifier as the peer ensemble teacher to transfer knowledge from the high-capacity teacher to peers and to further optimise the ensemble teacher. Meanwhile, we employ the temporal mean model of each peer as the peer mean teacher to collaboratively transfer knowledge among peers, which facilitates to optimise a more stable model and alleviate the accumulation of training error among peers. Integrating them into a unified framework takes full advantage of online ensembling and network collaboration for improving the quality of online distillation. Extensive experiments on CIFAR-10, CIFAR-100 and ImageNet show that the proposed method not only significantly improves the generalisation capability of various backbone networks, but also outperforms the state-of-the-art alternative methods.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/23/2021

Semi-Online Knowledge Distillation

Knowledge distillation is an effective and stable method for model compr...
research
03/21/2023

Heterogeneous-Branch Collaborative Learning for Dialogue Generation

With the development of deep learning, advanced dialogue generation meth...
research
06/12/2018

Knowledge Distillation by On-the-Fly Native Ensemble

Knowledge distillation is effective to train small and generalisable net...
research
03/22/2022

Channel Self-Supervision for Online Knowledge Distillation

Recently, researchers have shown an increased interest in the online kno...
research
06/24/2022

Online Distillation with Mixed Sample Augmentation

Mixed Sample Regularization (MSR), such as MixUp or CutMix, is a powerfu...
research
09/10/2019

Knowledge Transfer Graph for Deep Collaborative Learning

We propose Deep Collaborative Learning (DCL), which is a method that inc...
research
12/09/2020

Robust Domain Randomised Reinforcement Learning through Peer-to-Peer Distillation

In reinforcement learning, domain randomisation is an increasingly popul...

Please sign up or login with your details

Forgot password? Click here to reset