Which Features are Learnt by Contrastive Learning? On the Role of Simplicity Bias in Class Collapse and Feature Suppression

05/25/2023
by   Yihao Xue, et al.
0

Contrastive learning (CL) has emerged as a powerful technique for representation learning, with or without label supervision. However, supervised CL is prone to collapsing representations of subclasses within a class by not capturing all their features, and unsupervised CL may suppress harder class-relevant features by focusing on learning easy class-irrelevant features; both significantly compromise representation quality. Yet, there is no theoretical understanding of class collapse or feature suppression at test time. We provide the first unified theoretically rigorous framework to determine which features are learnt by CL. Our analysis indicate that, perhaps surprisingly, bias of (stochastic) gradient descent towards finding simpler solutions is a key factor in collapsing subclass representations and suppressing harder class-relevant features. Moreover, we present increasing embedding dimensionality and improving the quality of data augmentations as two theoretically motivated solutions to feature suppression. We also provide the first theoretical explanation for why employing supervised and unsupervised CL together yields higher-quality representations, even when using commonly-used stochastic gradient methods.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/02/2020

About contrastive unsupervised representation learning for classification and its convergence

Contrastive representation learning has been recently proved to be very ...
research
12/20/2013

Stochastic Gradient Estimate Variance in Contrastive Divergence and Persistent Contrastive Divergence

Contrastive Divergence (CD) and Persistent Contrastive Divergence (PCD) ...
research
05/30/2023

Identifying Spurious Biases Early in Training through the Lens of Simplicity Bias

Neural networks trained with (stochastic) gradient descent have an induc...
research
10/10/2019

PAC-Bayesian Contrastive Unsupervised Representation Learning

Contrastive unsupervised representation learning (CURL) is the state-of-...
research
05/12/2022

The Mechanism of Prediction Head in Non-contrastive Self-supervised Learning

Recently the surprising discovery of the Bootstrap Your Own Latent (BYOL...
research
10/20/2022

Freeze then Train: Towards Provable Representation Learning under Spurious Correlations and Feature Noise

The existence of spurious correlations such as image backgrounds in the ...
research
06/22/2020

What shapes feature representations? Exploring datasets, architectures, and training

In naturalistic learning problems, a model's input contains a wide range...

Please sign up or login with your details

Forgot password? Click here to reset