For self-supervised learning, Rationality implies generalization, provably

10/16/2020
by   Yamini Bansal, et al.
0

We prove a new upper bound on the generalization gap of classifiers that are obtained by first using self-supervision to learn a representation r of the training data, and then fitting a simple (e.g., linear) classifier g to the labels. Specifically, we show that (under the assumptions described below) the generalization gap of such classifiers tends to zero if 𝖢(g) ≪ n, where 𝖢(g) is an appropriately-defined measure of the simple classifier g's complexity, and n is the number of training samples. We stress that our bound is independent of the complexity of the representation r. We do not make any structural or conditional-independence assumptions on the representation-learning task, which can use the same training dataset that is later used for classification. Rather, we assume that the training procedure satisfies certain natural noise-robustness (adding small amount of label noise causes small degradation in performance) and rationality (getting the wrong label is not better than getting no label at all) conditions that widely hold across many standard architectures. We show that our bound is non-vacuous for many popular representation-learning based classifiers on CIFAR-10 and ImageNet, including SimCLR, AMDIM and MoCo.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/01/2022

Simplicial Embeddings in Self-Supervised Learning and Downstream Classification

We introduce Simplicial Embeddings (SEMs) as a way to constrain the enco...
research
10/19/2022

Training set cleansing of backdoor poisoning by self-supervised representation learning

A backdoor or Trojan attack is an important type of data poisoning attac...
research
08/11/2021

Asymptotic optimality and minimal complexity of classification by random projection

The generalization error of a classifier is related to the complexity of...
research
03/28/2021

Representation Learning by Ranking under multiple tasks

In recent years, representation learning has become the research focus o...
research
08/03/2023

Feature Noise Boosts DNN Generalization under Label Noise

The presence of label noise in the training data has a profound impact o...
research
06/13/2020

Generalization by Recognizing Confusion

A recently-proposed technique called self-adaptive training augments mod...
research
05/07/2019

Forest Representation Learning Guided by Margin Distribution

In this paper, we reformulate the forest representation learning approac...

Please sign up or login with your details

Forgot password? Click here to reset