A Mathematical Exploration of Why Language Models Help Solve Downstream Tasks

10/07/2020
by   Nikunj Saunshi, et al.
16

Autoregressive language models pretrained on large corpora have been successful at solving downstream tasks, even with zero-shot usage. However, there is little theoretical justification for their success. This paper considers the following questions: (1) Why should learning the distribution of natural language help with downstream classification tasks? (2) Why do features learned using language modeling help solve downstream tasks with linear classifiers? For (1), we hypothesize, and verify empirically, that classification tasks of interest can be reformulated as next word prediction tasks, thus making language modeling a meaningful pretraining task. For (2), we analyze properties of the cross-entropy objective to show that ϵ-optimal language models in cross-entropy (log-perplexity) learn features that are 𝒪(√(ϵ))-good on natural linear classification tasks, thus demonstrating mathematically that doing well on language modeling can be beneficial for downstream tasks. We perform experiments to verify assumptions and validate theoretical results. Our theoretical insights motivate a simple alternative to the cross-entropy objective that performs well on some linear classification tasks.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/24/2022

NoisyTune: A Little Noise Can Help You Finetune Pretrained Language Models Better

Effectively finetuning pretrained language models (PLMs) is critical for...
research
12/10/2019

Zero-shot Text Classification With Generative Language Models

This work investigates the use of natural language to enable zero-shot m...
research
06/17/2021

Why Do Pretrained Language Models Help in Downstream Tasks? An Analysis of Head and Prompt Tuning

Pretrained language models have achieved state-of-the-art performance wh...
research
09/09/2021

Avoiding Inference Heuristics in Few-shot Prompt-based Finetuning

Recent prompt-based approaches allow pretrained language models to achie...
research
02/09/2023

Toolformer: Language Models Can Teach Themselves to Use Tools

Language models (LMs) exhibit remarkable abilities to solve new tasks fr...
research
02/17/2023

A Simplistic Model of Neural Scaling Laws: Multiperiodic Santa Fe Processes

It was observed that large language models exhibit a power-law decay of ...
research
05/24/2023

Trade-Offs Between Fairness and Privacy in Language Modeling

Protecting privacy in contemporary NLP models is gaining in importance. ...

Please sign up or login with your details

Forgot password? Click here to reset