Establishing a stronger baseline for lightweight contrastive models

12/14/2022
by   Wenye Lin, et al.
0

Recent research has reported a performance degradation in self-supervised contrastive learning for specially designed efficient networks, such as MobileNet and EfficientNet. A common practice to address this problem is to introduce a pretrained contrastive teacher model and train the lightweight networks with distillation signals generated by the teacher. However, it is time and resource consuming to pretrain a teacher model when it is not available. In this work, we aim to establish a stronger baseline for lightweight contrastive models without using a pretrained teacher model. Specifically, we show that the optimal recipe for efficient models is different from that of larger models, and using the same training settings as ResNet50, as previous research does, is inappropriate. Additionally, we observe a common issu e in contrastive learning where either the positive or negative views can be noisy, and propose a smoothed version of InfoNCE loss to alleviate this problem. As a result, we successfully improve the linear evaluation results from 36.3% to 62.3% for MobileNet-V3-Large and from 42.2% to 65.8% for EfficientNet-B0 on ImageNet, closing the accuracy gap to ResNet50 with 5× fewer parameters. We hope our research will facilitate the usage of lightweight contrastive models.

READ FULL TEXT
research
04/19/2021

DisCo: Remedy Self-supervised Learning on Lightweight Models with Distilled Contrastive Learning

While self-supervised representation learning (SSL) has received widespr...
research
10/23/2020

Iterative Graph Self-Distillation

How to discriminatively vectorize graphs is a fundamental challenge that...
research
07/30/2021

On the Efficacy of Small Self-Supervised Contrastive Models without Distillation Signals

It is a consensus that small models perform quite poorly under the parad...
research
09/30/2022

Slimmable Networks for Contrastive Self-supervised Learning

Self-supervised learning makes great progress in large model pre-trainin...
research
05/07/2019

Contrastive Learning for Lifted Networks

In this work we address supervised learning via lifted network formulati...
research
07/24/2023

CLIP-KD: An Empirical Study of Distilling CLIP Models

CLIP has become a promising language-supervised visual pre-training fram...
research
04/21/2023

Learn What NOT to Learn: Towards Generative Safety in Chatbots

Conversational models that are generative and open-domain are particular...

Please sign up or login with your details

Forgot password? Click here to reset