On Masked Pre-training and the Marginal Likelihood

06/01/2023
by   Pablo Moreno-Muñoz, et al.
0

Masked pre-training removes random input dimensions and learns a model that can predict the missing values. Empirical results indicate that this intuitive form of self-supervised learning yields models that generalize very well to new domains. A theoretical understanding is, however, lacking. This paper shows that masked pre-training with a suitable cumulative scoring function corresponds to maximizing the model's marginal likelihood, which is de facto the Bayesian model selection measure of generalization. Beyond shedding light on the success of masked pre-training, this insight also suggests that Bayesian models can be trained with appropriately designed self-supervision. Empirically, we confirm the developed theory and explore the main learning principles of masked pre-training in large language models.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/19/2023

Cross-Lingual Supervision improves Large Language Models Pre-training

The recent rapid progress in pre-training Large Language Models has reli...
research
11/04/2021

Leveraging Time Irreversibility with Order-Contrastive Pre-training

Label-scarce, high-dimensional domains such as healthcare present a chal...
research
09/12/2022

PreSTU: Pre-Training for Scene-Text Understanding

The ability to read and reason about texts in an image is often lacking ...
research
01/02/2021

Improving Sequence-to-Sequence Pre-training via Sequence Span Rewriting

In this paper, we generalize text infilling (e.g., masked language model...
research
10/12/2020

Improving Self-supervised Pre-training via a Fully-Explored Masked Language Model

Masked Language Model (MLM) framework has been widely adopted for self-s...
research
06/11/2023

Reducing Barriers to Self-Supervised Learning: HuBERT Pre-training with Academic Compute

Self-supervised learning (SSL) has led to great strides in speech proces...
research
05/30/2019

Unsupervised pre-training helps to conserve views from input distribution

We investigate the effects of the unsupervised pre-training method under...

Please sign up or login with your details

Forgot password? Click here to reset