Blessing of Class Diversity in Pre-training

09/07/2022
by   Yulai Zhao, et al.
0

This paper presents a new statistical analysis aiming to explain the recent superior achievements of the pre-training techniques in natural language processing (NLP). We prove that when the classes of the pre-training task (e.g., different words in the masked language model task) are sufficiently diverse, in the sense that the least singular value of the last linear layer in pre-training (denoted as ν̃) is large, then pre-training can significantly improve the sample efficiency of downstream tasks. Specially, we show the transfer learning excess risk enjoys an O(1/ν̃√(n)) rate, in contrast to the O(1/√(m)) rate in the standard supervised learning. Here, n is the number of pre-training data and m is the number of data in the downstream task, and typically n ≫ m. Our proof relies on a vector-form Rademacher complexity chain rule for disassembling composite function classes and a modified self-concordance condition. These techniques can be of independent interest.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/31/2021

Adversarial Contrastive Pre-training for Protein Sequences

Recent developments in Natural Language Processing (NLP) demonstrate tha...
research
04/12/2020

Pre-training Text Representations as Meta Learning

Pre-training text representations has recently been shown to significant...
research
08/03/2023

Curricular Transfer Learning for Sentence Encoded Tasks

Fine-tuning language models in a downstream task is the standard approac...
research
06/21/2023

Task-Robust Pre-Training for Worst-Case Downstream Adaptation

Pre-training has achieved remarkable success when transferred to downstr...
research
07/31/2023

Structural Transfer Learning in NL-to-Bash Semantic Parsers

Large-scale pre-training has made progress in many fields of natural lan...
research
04/18/2021

On the Influence of Masking Policies in Intermediate Pre-training

Current NLP models are predominantly trained through a pretrain-then-fin...
research
02/02/2022

Relative Position Prediction as Pre-training for Text Encoders

Meaning is defined by the company it keeps. However, company is two-fold...

Please sign up or login with your details

Forgot password? Click here to reset