Toward Understanding the Feature Learning Process of Self-supervised Contrastive Learning

05/31/2021
by   Zixin Wen, et al.
0

How can neural networks trained by contrastive learning extract features from the unlabeled data? Why does contrastive learning usually need much stronger data augmentations than supervised learning to ensure good representations? These questions involve both the optimization and statistical aspects of deep learning, but can hardly be answered by analyzing supervised learning, where the target functions are the highest pursuit. Indeed, in self-supervised learning, it is inevitable to relate to the optimization/generalization of neural networks to how they can encode the latent structures in the data, which we refer to as the feature learning process. In this work, we formally study how contrastive learning learns the feature representations for neural networks by analyzing its feature learning process. We consider the case where our data are comprised of two types of features: the more semantically aligned sparse features which we want to learn from, and the other dense features we want to avoid. Theoretically, we prove that contrastive learning using ReLU networks provably learns the desired sparse features if proper augmentations are adopted. We present an underlying principle called feature decoupling to explain the effects of augmentations, where we theoretically characterize how augmentations can reduce the correlations of dense features between positive samples while keeping the correlations of sparse features intact, thereby forcing the neural networks to learn from the self-supervision of sparse features. Empirically, we verified that the feature decoupling principle matches the underlying mechanism of contrastive learning in practice.

READ FULL TEXT

page 3

page 4

page 5

research
11/19/2022

Local Contrastive Feature learning for Tabular Data

Contrastive self-supervised learning has been successfully used in many ...
research
05/12/2022

The Mechanism of Prediction Head in Non-contrastive Self-supervised Learning

Recently the surprising discovery of the Bootstrap Your Own Latent (BYOL...
research
04/06/2022

RODD: A Self-Supervised Approach for Robust Out-of-Distribution Detection

Recent studies have addressed the concern of detecting and rejecting the...
research
06/07/2023

On the Joint Interaction of Models, Data, and Features

Learning features from data is one of the defining characteristics of de...
research
10/07/2022

Temporal Feature Alignment in Contrastive Self-Supervised Learning for Human Activity Recognition

Automated Human Activity Recognition has long been a problem of great in...
research
04/29/2021

Hyperspherically Regularized Networks for BYOL Improves Feature Uniformity and Separability

Bootstrap Your Own Latent (BYOL) introduced an approach to self-supervis...
research
03/03/2021

Contrastive learning of strong-mixing continuous-time stochastic processes

Contrastive learning is a family of self-supervised methods where a mode...

Please sign up or login with your details

Forgot password? Click here to reset