On a Benefit of Mask Language Modeling: Robustness to Simplicity Bias

10/11/2021
by   Ting-Rui Chiang, et al.
0

Despite the success of pretrained masked language models (MLM), why MLM pretraining is useful is still a qeustion not fully answered. In this work we theoretically and empirically show that MLM pretraining makes models robust to lexicon-level spurious features, partly answer the question. We theoretically show that, when we can model the distribution of a spurious feature Π conditioned on the context, then (1) Π is at least as informative as the spurious feature, and (2) learning from Π is at least as simple as learning from the spurious feature. Therefore, MLM pretraining rescues the model from the simplicity bias caused by the spurious feature. We also explore the efficacy of MLM pretraing in causal settings. Finally we close the gap between our theories and the real world practices by conducting experiments on the hate speech detection and the name entity recognition tasks.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/04/2023

Representation Deficiency in Masked Language Modeling

Masked Language Modeling (MLM) has been one of the most prominent approa...
research
05/22/2023

Language-Agnostic Bias Detection in Language Models

Pretrained language models (PLMs) are key components in NLP, but they co...
research
12/28/2018

Looking for ELMo's friends: Sentence-Level Pretraining Beyond Language Modeling

Work on the problem of contextualized word representation -- the develop...
research
10/09/2021

The Inductive Bias of In-Context Learning: Rethinking Pretraining Example Design

Pretraining Neural Language Models (NLMs) over a large corpus involves c...
research
04/05/2022

Action-Conditioned Contrastive Policy Pretraining

Deep visuomotor policy learning achieves promising results in control ta...
research
10/20/2021

LMSOC: An Approach for Socially Sensitive Pretraining

While large-scale pretrained language models have been shown to learn ef...
research
06/09/2022

Revisiting End-to-End Speech-to-Text Translation From Scratch

End-to-end (E2E) speech-to-text translation (ST) often depends on pretra...

Please sign up or login with your details

Forgot password? Click here to reset