Functional Regularization for Representation Learning: A Unified Theoretical Perspective

08/06/2020
by   Siddhant Garg, et al.
13

Unsupervised and self-supervised learning approaches have become a crucial tool to learn representations for downstream prediction tasks. While these approaches are widely used in practice and achieve impressive empirical gains, their theoretical understanding largely lags behind. Towards bridging this gap, we present a unifying perspective where several such approaches can be viewed as imposing a regularization on the representation via a learnable function using unlabeled data. We propose a discriminative theoretical framework for analyzing the sample complexity of these approaches. Our sample complexity bounds show that, with carefully chosen hypothesis classes to exploit the structure in the data, such functional regularization can prune the hypothesis space and help reduce the labeled data needed. We then provide two concrete examples of functional regularization, one using auto-encoders and the other using masked self-supervision, and apply the framework to quantify the reduction in the sample complexity bound. We also provide complementary empirical results for the examples to support our analysis.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/03/2020

Predicting What You Already Know Helps: Provable Self-Supervised Learning

Self-supervised representation learning solves auxiliary prediction task...
research
06/14/2019

A Distribution Dependent and Independent Complexity Analysis of Manifold Regularization

Manifold regularization is a commonly used technique in semi-supervised ...
research
10/07/2020

Theoretical Analysis of Self-Training with Deep Networks on Unlabeled Data

Self-training algorithms, which train a model to fit pseudolabels predic...
research
02/25/2019

A Theoretical Analysis of Contrastive Unsupervised Representation Learning

Recent empirical works have successfully used unlabeled data to learn fe...
research
10/19/2012

On Information Regularization

We formulate a principle for classification with the knowledge of the ma...
research
11/22/2022

Good Data from Bad Models : Foundations of Threshold-based Auto-labeling

Creating large-scale high-quality labeled datasets is a major bottleneck...
research
02/24/2020

Learning the mapping x∑_i=1^d x_i^2: the cost of finding the needle in a haystack

The task of using machine learning to approximate the mapping x∑_i=1^d x...

Please sign up or login with your details

Forgot password? Click here to reset