Mixed-effects transformers for hierarchical adaptation

05/03/2022
by   Julia White, et al.
0

Language use differs dramatically from context to context. To some degree, modern language models like GPT-3 are able to account for such variance by conditioning on a string of previous input text, or prompt. Yet prompting is ineffective when contexts are sparse, out-of-sample, or extra-textual; for instance, accounting for when and where the text was produced or who produced it. In this paper, we introduce the mixed-effects transformer (MET), a novel approach for learning hierarchically-structured prefixes – lightweight modules prepended to the input – to account for structured variation. Specifically, we show how the popular class of mixed-effects models may be extended to transformer-based architectures using a regularized prefix-tuning procedure with dropout. We evaluate this approach on several domain-adaptation benchmarks, finding that it efficiently adapts to novel contexts with minimal data while still effectively generalizing to unseen contexts.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/15/2021

What Context Features Can Transformer Language Models Use?

Transformer-based language models benefit from conditioning on contexts ...
research
10/26/2021

Hierarchical Transformers Are More Efficient Language Models

Transformer models yield impressive results on many NLP and sequence mod...
research
04/14/2021

UDALM: Unsupervised Domain Adaptation through Language Modeling

In this work we explore Unsupervised Domain Adaptation (UDA) of pretrain...
research
10/20/2021

LMSOC: An Approach for Socially Sensitive Pretraining

While large-scale pretrained language models have been shown to learn ef...
research
07/14/2022

Convolutional Bypasses Are Better Vision Transformer Adapters

The pretrain-then-finetune paradigm has been widely adopted in computer ...

Please sign up or login with your details

Forgot password? Click here to reset