Representation Learning Dynamics of Self-Supervised Models

09/05/2023
by   Pascal Esser, et al.
0

Self-Supervised Learning (SSL) is an important paradigm for learning representations from unlabelled data, and SSL with neural networks has been highly successful in practice. However current theoretical analysis of SSL is mostly restricted to generalisation error bounds. In contrast, learning dynamics often provide a precise characterisation of the behaviour of neural networks based models but, so far, are mainly known in supervised settings. In this paper, we study the learning dynamics of SSL models, specifically representations obtained by minimising contrastive and non-contrastive losses. We show that a naive extension of the dymanics of multivariate regression to SSL leads to learning trivial scalar representations that demonstrates dimension collapse in SSL. Consequently, we formulate SSL objectives with orthogonality constraints on the weights, and derive the exact (network width independent) learning dynamics of the SSL models trained using gradient descent on the Grassmannian manifold. We also argue that the infinite width approximation of SSL models significantly deviate from the neural tangent kernel approximations of supervised models. We numerically illustrate the validity of our theoretical findings, and discuss how the presented results provide a framework for further theoretical analysis of contrastive and non-contrastive SSL.

READ FULL TEXT
research
12/02/2020

About contrastive unsupervised representation learning for classification and its convergence

Contrastive representation learning has been recently proved to be very ...
research
09/05/2023

Non-Parametric Representation Learning with Kernels

Unsupervised and self-supervised representation learning has become popu...
research
06/02/2022

Understanding the Role of Nonlinearity in Training Dynamics of Contrastive Learning

While the empirical success of self-supervised learning (SSL) heavily re...
research
04/01/2023

Towards Understanding the Mechanism of Contrastive Learning via Similarity Structure: A Theoretical Analysis

Contrastive learning is an efficient approach to self-supervised represe...
research
02/17/2021

Contrastive Learning Inverts the Data Generating Process

Contrastive learning has recently seen tremendous success in self-superv...
research
05/12/2022

The Mechanism of Prediction Head in Non-contrastive Self-supervised Learning

Recently the surprising discovery of the Bootstrap Your Own Latent (BYOL...
research
08/29/2016

About Learning in Recurrent Bistable Gradient Networks

Recurrent Bistable Gradient Networks are attractor based neural networks...

Please sign up or login with your details

Forgot password? Click here to reset