Multi-Mode Online Knowledge Distillation for Self-Supervised Visual Representation Learning

04/13/2023
by   Kaiyou Song, et al.
0

Self-supervised learning (SSL) has made remarkable progress in visual representation learning. Some studies combine SSL with knowledge distillation (SSL-KD) to boost the representation learning performance of small models. In this study, we propose a Multi-mode Online Knowledge Distillation method (MOKD) to boost self-supervised visual representation learning. Different from existing SSL-KD methods that transfer knowledge from a static pre-trained teacher to a student, in MOKD, two different models learn collaboratively in a self-supervised manner. Specifically, MOKD consists of two distillation modes: self-distillation and cross-distillation modes. Among them, self-distillation performs self-supervised learning for each model independently, while cross-distillation realizes knowledge interaction between different models. In cross-distillation, a cross-attention feature search strategy is proposed to enhance the semantic feature alignment between different models. As a result, the two models can absorb knowledge from each other to boost their representation learning performance. Extensive experimental results on different backbones and datasets demonstrate that two heterogeneous models can benefit from MOKD and outperform their independently trained baseline. In addition, MOKD also outperforms existing SSL-KD methods for both the student and teacher models.

READ FULL TEXT

page 1

page 3

page 8

page 12

research
11/23/2022

Distilling Knowledge from Self-Supervised Teacher by Embedding Graph Alignment

Recent advances have indicated the strengths of self-supervised pre-trai...
research
07/21/2022

KD-MVS: Knowledge Distillation Based Self-supervised Learning for MVS

Supervised multi-view stereo (MVS) methods have achieved remarkable prog...
research
07/06/2023

On-Device Constrained Self-Supervised Speech Representation Learning for Keyword Spotting via Knowledge Distillation

Large self-supervised models are effective feature extractors, but their...
research
09/03/2022

A Novel Self-Knowledge Distillation Approach with Siamese Representation Learning for Action Recognition

Knowledge distillation is an effective transfer of knowledge from a heav...
research
04/20/2021

Distill on the Go: Online knowledge distillation in self-supervised learning

Self-supervised learning solves pretext prediction tasks that do not req...
research
11/23/2021

Domain-Agnostic Clustering with Self-Distillation

Recent advancements in self-supervised learning have reduced the gap bet...
research
07/30/2021

On the Efficacy of Small Self-Supervised Contrastive Models without Distillation Signals

It is a consensus that small models perform quite poorly under the parad...

Please sign up or login with your details

Forgot password? Click here to reset