On the Inconsistencies of Conditionals Learned by Masked Language Models

12/30/2022
by   Tom Young, et al.
0

Learning to predict masked tokens in a sequence has been shown to be a powerful pretraining objective for large-scale language models. After training, such masked language models can provide distributions of tokens conditioned on bidirectional context. In this short draft, we show that such bidirectional conditionals often demonstrate considerable inconsistencies, i.e., they can not be derived from a coherent joint distribution when considered together. We empirically quantify such inconsistencies in the simple scenario of bigrams for two common styles of masked language models: T5-style and BERT-style. For example, we show that T5 models often confuse its own preference regarding two similar bigrams. Such inconsistencies may represent a theoretical pitfall for the research work on sampling sequences based on the bidirectional conditionals learned by BERT-style MLMs. This phenomenon also means that T5-style MLMs capable of infilling will generate discrepant results depending on how much masking is given, which may represent a particular trust issue.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/24/2023

Instruction Tuning with Lexicons for Zero-Shot Style Classification

Style is used to convey authors' intentions and attitudes. Despite the s...
research
07/10/2023

Large Language Models as General Pattern Machines

We observe that pre-trained large language models (LLMs) are capable of ...
research
06/28/2016

Hierarchical Neural Language Models for Joint Representation of Streaming Documents and their Content

We consider the problem of learning distributed representations for docu...
research
10/27/2022

Nearest Neighbor Language Models for Stylistic Controllable Generation

Recent language modeling performance has been greatly improved by the us...
research
01/05/2022

Formal Analysis of Art: Proxy Learning of Visual Concepts from Style Through Language Models

We present a machine learning system that can quantify fine art painting...
research
05/24/2023

Deriving Language Models from Masked Language Models

Masked language models (MLM) do not explicitly define a distribution ove...
research
09/26/2022

Smells like Teen Spirit: An Exploration of Sensorial Style in Literary Genres

It is well recognized that sensory perceptions and language have interco...

Please sign up or login with your details

Forgot password? Click here to reset