Predictor networks and stop-grads provide implicit variance regularization in BYOL/SimSiam

12/09/2022
by   Manu Srinath Halvagal, et al.
0

Self-supervised learning (SSL) learns useful representations from unlabelled data by training networks to be invariant to pairs of augmented versions of the same input. Non-contrastive methods avoid collapse either by directly regularizing the covariance matrix of network outputs or through asymmetric loss architectures, two seemingly unrelated approaches. Here, by building on DirectPred, we lay out a theoretical framework that reconciles these two views. We derive analytical expressions for the representational learning dynamics in linear networks. By expressing them in the eigenspace of the embedding covariance matrix, where the solutions decouple, we reveal the mechanism and conditions that provide implicit variance regularization. These insights allow us to formulate a new isotropic loss function that equalizes eigenvalue contribution and renders learning more robust. Finally, we show empirically that our findings translate to nonlinear networks trained on CIFAR-10 and STL-10.

READ FULL TEXT
research
09/29/2022

Variance Covariance Regularization Enforces Pairwise Independence in Self-Supervised Representations

Self-Supervised Learning (SSL) methods such as VICReg, Barlow Twins or W...
research
05/11/2021

VICReg: Variance-Invariance-Covariance Regularization for Self-Supervised Learning

Recent self-supervised methods for image representation learning are bas...
research
10/01/2020

Understanding Self-supervised Learning with Dual Deep Networks

We propose a novel theoretical framework to understand self-supervised l...
research
06/21/2022

TiCo: Transformation Invariance and Covariance Contrast for Self-Supervised Visual Representation Learning

We present Transformation Invariance and Covariance Contrast (TiCo) for ...
research
10/11/2021

Towards Demystifying Representation Learning with Non-contrastive Self-supervision

Non-contrastive methods of self-supervised learning (such as BYOL and Si...
research
06/01/2016

Self-Paced Learning: an Implicit Regularization Perspective

Self-paced learning (SPL) mimics the cognitive mechanism of humans and a...
research
06/16/2019

Characterizing the Exact Behaviors of Temporal Difference Learning Algorithms Using Markov Jump Linear System Theory

In this paper, we provide a unified analysis of temporal difference lear...

Please sign up or login with your details

Forgot password? Click here to reset