Understanding Augmentation-based Self-Supervised Representation Learning via RKHS Approximation

06/01/2023
by   Runtian Zhai, et al.
0

Good data augmentation is one of the key factors that lead to the empirical success of self-supervised representation learning such as contrastive learning and masked language modeling, yet theoretical understanding of its role in learning good representations remains limited. Recent work has built the connection between self-supervised learning and approximating the top eigenspace of a graph Laplacian operator. Learning a linear probe on top of such features can naturally be connected to RKHS regression. In this work, we use this insight to perform a statistical analysis of augmentation-based pretraining. We start from the isometry property, a key geometric characterization of the target function given by the augmentation. Our first main theorem provides, for an arbitrary encoder, near tight bounds for both the estimation error incurred by fitting the linear probe on top of the encoder, and the approximation error entailed by the fitness of the RKHS the encoder learns. Our second main theorem specifically addresses the case where the encoder extracts the top-d eigenspace of a Monte-Carlo approximation of the underlying kernel with the finite pretraining samples. Our analysis completely disentangles the effects of the model and the augmentation. A key ingredient in our analysis is the augmentation complexity, which we use to quantitatively compare different augmentations and analyze their impact on downstream performance on synthetic and real datasets.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/01/2020

Towards Good Practices in Self-supervised Representation Learning

Self-supervised representation learning has seen remarkable progress in ...
research
05/13/2022

Toward a Geometrical Understanding of Self-supervised Contrastive Learning

Self-supervised learning (SSL) is currently one of the premier technique...
research
07/05/2022

Features Based Adaptive Augmentation for Graph Contrastive Learning

Self-Supervised learning aims to eliminate the need for expensive annota...
research
06/08/2021

Provable Guarantees for Self-Supervised Deep Learning with Spectral Contrastive Loss

Recent works in self-supervised learning have advanced the state-of-the-...
research
11/01/2022

Augmentation Invariant Manifold Learning

Data augmentation is a widely used technique and an essential ingredient...
research
10/26/2021

TUNet: A Block-online Bandwidth Extension Model based on Transformers and Self-supervised Pretraining

We introduce a block-online variant of the temporal feature-wise linear ...
research
06/08/2021

Self-Supervised Learning with Data Augmentations Provably Isolates Content from Style

Self-supervised representation learning has shown remarkable success in ...

Please sign up or login with your details

Forgot password? Click here to reset