On the Provable Advantage of Unsupervised Pretraining

03/02/2023
by   Jiawei Ge, et al.
0

Unsupervised pretraining, which learns a useful representation using a large amount of unlabeled data to facilitate the learning of downstream tasks, is a critical component of modern large-scale machine learning systems. Despite its tremendous empirical success, the rigorous theoretical understanding of why unsupervised pretraining generally helps remains rather limited – most existing results are restricted to particular methods or approaches for unsupervised pretraining with specialized structural assumptions. This paper studies a generic framework, where the unsupervised representation learning task is specified by an abstract class of latent variable models Φ and the downstream task is specified by a class of prediction functions Ψ. We consider a natural approach of using Maximum Likelihood Estimation (MLE) for unsupervised pretraining and Empirical Risk Minimization (ERM) for learning downstream tasks. We prove that, under a mild ”informative” condition, our algorithm achieves an excess risk of 𝒪̃(√(𝒞_Φ/m) + √(𝒞_Ψ/n)) for downstream tasks, where 𝒞_Φ, 𝒞_Ψ are complexity measures of function classes Φ, Ψ, and m, n are the number of unlabeled and labeled data respectively. Comparing to the baseline of 𝒪̃(√(𝒞_Φ∘Ψ/n)) achieved by performing supervised learning using only the labeled data, our result rigorously shows the benefit of unsupervised pretraining when m ≫ n and 𝒞_Φ∘Ψ > 𝒞_Ψ. This paper further shows that our generic framework covers a wide range of approaches for unsupervised pretraining, including factor models, Gaussian mixture models, and contrastive learning.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/05/2022

ChemBERTa-2: Towards Chemical Foundation Models

Large pretrained models such as GPT-3 have had tremendous impact on mode...
research
02/27/2020

Masking Orchestration: Multi-task Pretraining for Multi-role Dialogue Representation Learning

Multi-role dialogue understanding comprises a wide range of diverse task...
research
10/04/2018

Unsupervised Learning via Meta-Learning

A central goal of unsupervised learning is to acquire representations fr...
research
10/31/2022

Active Learning of Non-semantic Speech Tasks with Pretrained Models

Pretraining neural networks with massive unlabeled datasets has become p...
research
12/07/2021

Unsupervised Representation Learning via Neural Activation Coding

We present neural activation coding (NAC) as a novel approach for learni...
research
10/24/2021

Understanding the World Through Action

The recent history of machine learning research has taught us that machi...
research
12/30/2022

An Analysis of Attention via the Lens of Exchangeability and Latent Variable Models

With the attention mechanism, transformers achieve significant empirical...

Please sign up or login with your details

Forgot password? Click here to reset