Understanding self-supervised Learning Dynamics without Contrastive Pairs

02/12/2021
by   Yuandong Tian, et al.
9

Contrastive approaches to self-supervised learning (SSL) learn representations by minimizing the distance between two augmented views of the same data point (positive pairs) and maximizing the same from different data points (negative pairs). However, recent approaches like BYOL and SimSiam, show remarkable performance without negative pairs, raising a fundamental theoretical question: how can SSL with only positive pairs avoid representational collapse? We study the nonlinear learning dynamics of non-contrastive SSL in simple linear networks. Our analysis yields conceptual insights into how non-contrastive SSL methods learn, how they avoid representational collapse, and how multiple factors, like predictor networks, stop-gradients, exponential moving averages, and weight decay all come into play. Our simple theory recapitulates the results of real-world ablation studies in both STL-10 and ImageNet. Furthermore, motivated by our theory we propose a novel approach that directly sets the predictor based on the statistics of its inputs. In the case of linear predictors, our approach outperforms gradient training of the predictor by 5% and on ImageNet it performs comparably with more complex two-layer non-linear predictors that employ BatchNorm. Code is released in https://github.com/facebookresearch/luckmatters/tree/master/ssl.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/11/2021

Towards Demystifying Representation Learning with Non-contrastive Self-supervision

Non-contrastive methods of self-supervised learning (such as BYOL and Si...
research
03/04/2023

Towards a Unified Theoretical Understanding of Non-contrastive Learning via Rank Differential Mechanism

Recently, a variety of methods under the name of non-contrastive learnin...
research
10/01/2020

Understanding Self-supervised Learning with Dual Deep Networks

We propose a novel theoretical framework to understand self-supervised l...
research
02/09/2023

The Edge of Orthogonality: A Simple View of What Makes BYOL Tick

Self-predictive unsupervised learning methods such as BYOL or SimSiam ha...
research
04/20/2021

SelfReg: Self-supervised Contrastive Regularization for Domain Generalization

In general, an experimental environment for deep learning assumes that t...
research
04/07/2023

On the Importance of Contrastive Loss in Multimodal Learning

Recently, contrastive learning approaches (e.g., CLIP (Radford et al., 2...
research
05/31/2022

Contrasting quadratic assignments for set-based representation learning

The standard approach to contrastive learning is to maximize the agreeme...

Please sign up or login with your details

Forgot password? Click here to reset