On the Efficacy of Small Self-Supervised Contrastive Models without Distillation Signals

07/30/2021
by   Haizhou Shi, et al.
0

It is a consensus that small models perform quite poorly under the paradigm of self-supervised contrastive learning. Existing methods usually adopt a large off-the-shelf model to transfer knowledge to the small one via knowledge distillation. Despite their effectiveness, distillation-based methods may not be suitable for some resource-restricted scenarios due to the huge computational expenses of deploying a large model. In this paper, we study the issue of training self-supervised small models without distillation signals. We first evaluate the representation spaces of the small models and make two non-negligible observations: (i) small models can complete the pretext task without overfitting despite its limited capacity; (ii) small models universally suffer the problem of over-clustering. Then we verify multiple assumptions that are considered to alleviate the over-clustering phenomenon. Finally, we combine the validated techniques and improve the baseline of five small architectures with considerable margins, which indicates that training small self-supervised contrastive models is feasible even without distillation signals.

READ FULL TEXT

page 1

page 4

research
04/13/2023

Multi-Mode Online Knowledge Distillation for Self-Supervised Visual Representation Learning

Self-supervised learning (SSL) has made remarkable progress in visual re...
research
06/21/2021

Simple Distillation Baselines for Improving Small Self-supervised Models

While large self-supervised models have rivalled the performance of thei...
research
09/30/2022

Slimmable Networks for Contrastive Self-supervised Learning

Self-supervised learning makes great progress in large model pre-trainin...
research
12/14/2022

Establishing a stronger baseline for lightweight contrastive models

Recent research has reported a performance degradation in self-supervise...
research
03/14/2023

MobileVOS: Real-Time Video Object Segmentation Contrastive Learning meets Knowledge Distillation

This paper tackles the problem of semi-supervised video object segmentat...
research
01/23/2023

A Simple Recipe for Competitive Low-compute Self supervised Vision Models

Self-supervised methods in vision have been mostly focused on large arch...
research
12/06/2022

Label-free Knowledge Distillation with Contrastive Loss for Light-weight Speaker Recognition

Very deep models for speaker recognition (SR) have demonstrated remarkab...

Please sign up or login with your details

Forgot password? Click here to reset