Data-Efficient Contrastive Self-supervised Learning: Easy Examples Contribute the Most

02/18/2023
by   Siddharth Joshi, et al.
0

Self-supervised learning (SSL) learns high-quality representations from large pools of unlabeled training data. As datasets grow larger, it becomes crucial to identify the examples that contribute the most to learning such representations. This enables efficient SSL by reducing the volume of data required for learning high-quality representations. Nevertheless, quantifying the value of examples for SSL has remained an open question. In this work, we address this for the first time, by proving that examples that contribute the most to contrastive SSL are those that have the most similar augmentations to other examples, in expectation. We provide rigorous guarantees for the generalization performance of SSL on such subsets. Empirically, we discover, perhaps surprisingly, the subsets that contribute the most to SSL are those that contribute the least to supervised learning. Through extensive experiments, we show that our subsets outperform random subsets by more than 3 on CIFAR100, CIFAR10, and STL10. Interestingly, we also find that we can safely exclude 20 downstream task performance.

READ FULL TEXT

page 4

page 5

page 12

research
02/13/2020

A Simple Framework for Contrastive Learning of Visual Representations

This paper presents SimCLR: a simple framework for contrastive learning ...
research
05/17/2021

Divide and Contrast: Self-supervised Learning from Uncurated Data

Self-supervised learning holds promise in leveraging large amounts of un...
research
07/18/2023

Towards the Sparseness of Projection Head in Self-Supervised Learning

In recent years, self-supervised learning (SSL) has emerged as a promisi...
research
06/16/2021

Watching Too Much Television is Good: Self-Supervised Audio-Visual Representation Learning from Movies and TV Shows

The abundance and ease of utilizing sound, along with the fact that audi...
research
05/06/2023

PointCMP: Contrastive Mask Prediction for Self-supervised Learning on Point Cloud Videos

Self-supervised learning can extract representations of good quality fro...
research
04/22/2021

Self-Supervised Learning from Semantically Imprecise Data

Learning from imprecise labels such as "animal" or "bird", but making pr...
research
11/24/2021

PSSL: Self-supervised Learning for Personalized Search with Contrastive Sampling

Personalized search plays a crucial role in improving user search experi...

Please sign up or login with your details

Forgot password? Click here to reset