InfoNCE Loss Provably Learns Cluster-Preserving Representations

02/15/2023
by   Advait Parulekar, et al.
0

The goal of contrasting learning is to learn a representation that preserves underlying clusters by keeping samples with similar content, e.g. the “dogness” of a dog, close to each other in the space generated by the representation. A common and successful approach for tackling this unsupervised learning problem is minimizing the InfoNCE loss associated with the training samples, where each sample is associated with their augmentations (positive samples such as rotation, crop) and a batch of negative samples (unrelated samples). To the best of our knowledge, it was unanswered if the representation learned by minimizing the InfoNCE loss preserves the underlying data clusters, as it only promotes learning a representation that is faithful to augmentations, i.e., an image and its augmentations have the same representation. Our main result is to show that the representation learned by InfoNCE with a finite number of negative samples is also consistent with respect to clusters in the data, under the condition that the augmentation sets within clusters may be non-overlapping but are close and intertwined, relative to the complexity of the learning function class.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/31/2022

Supervised Contrastive Learning with Hard Negative Samples

Unsupervised contrastive learning (UCL) is a self-supervised learning te...
research
03/20/2022

Partitioning Image Representation in Contrastive Learning

In contrastive learning in the image domain, the anchor and positive sam...
research
05/24/2023

SUVR: A Search-based Approach to Unsupervised Visual Representation Learning

Unsupervised learning has grown in popularity because of the difficulty ...
research
06/17/2020

LSD-C: Linearly Separable Deep Clusters

We present LSD-C, a novel method to identify clusters in an unlabeled da...
research
02/08/2021

Improving memory banks for unsupervised learning with large mini-batch, consistency and hard negative mining

An important component of unsupervised learning by instance-based discri...
research
02/22/2023

Distribution Normalization: An "Effortless" Test-Time Augmentation for Contrastively Learned Visual-language Models

Advances in the field of visual-language contrastive learning have made ...
research
04/25/2023

Sample-Specific Debiasing for Better Image-Text Models

Self-supervised representation learning on image-text data facilitates c...

Please sign up or login with your details

Forgot password? Click here to reset