Language Models are Few-Shot Learners

05/28/2020
by   Tom B. Brown, et al.
34

Recent work has demonstrated substantial gains on many NLP tasks and benchmarks by pre-training on a large corpus of text followed by fine-tuning on a specific task. While typically task-agnostic in architecture, this method still requires task-specific fine-tuning datasets of thousands or tens of thousands of examples. By contrast, humans can generally perform a new language task from only a few examples or from simple instructions - something which current NLP systems still largely struggle to do. Here we show that scaling up language models greatly improves task-agnostic, few-shot performance, sometimes even reaching competitiveness with prior state-of-the-art fine-tuning approaches. Specifically, we train GPT-3, an autoregressive language model with 175 billion parameters, 10x more than any previous non-sparse language model, and test its performance in the few-shot setting. For all tasks, GPT-3 is applied without any gradient updates or fine-tuning, with tasks and few-shot demonstrations specified purely via text interaction with the model. GPT-3 achieves strong performance on many NLP datasets, including translation, question-answering, and cloze tasks, as well as several tasks that require on-the-fly reasoning or domain adaptation, such as unscrambling words, using a novel word in a sentence, or performing 3-digit arithmetic. At the same time, we also identify some datasets where GPT-3's few-shot learning still struggles, as well as some datasets where GPT-3 faces methodological issues related to training on large web corpora. Finally, we find that GPT-3 can generate samples of news articles which human evaluators have difficulty distinguishing from articles written by humans. We discuss broader societal impacts of this finding and of GPT-3 in general.

READ FULL TEXT

page 1

page 2

page 3

page 4

12/31/2020

Making Pre-trained Language Models Better Few-shot Learners

The recent GPT-3 model (Brown et al., 2020) achieves remarkable few-shot...
03/29/2020

Meta Fine-Tuning Neural Language Models for Multi-Domain Text Mining

Pre-trained neural language models bring significant improvement for var...
09/20/2023

A Paradigm Shift in Machine Translation: Boosting Translation Performance of Large Language Models

Generative Large Language Models (LLMs) have achieved remarkable advance...
06/12/2023

Gradient Ascent Post-training Enhances Language Model Generalization

In this work, we empirically show that updating pretrained LMs (350M, 1....
06/13/2023

TART: A plug-and-play Transformer module for task-agnostic reasoning

Large language models (LLMs) exhibit in-context learning abilities which...
09/08/2021

TruthfulQA: Measuring How Models Mimic Human Falsehoods

We propose a benchmark to measure whether a language model is truthful i...
05/29/2023

Self Information Update for Large Language Models through Mitigating Exposure Bias

Current LLMs have demonstrated remarkable capabilities in addressing use...

Code Repositories

gpt-3

GPT-3: Language Models are Few-Shot Learners


view repo

gpt3-list

List of things that people are claiming is enabled by GPT3. unverified but links to sources.


view repo

test

Measuring Massive Multitask Language Understanding | ICLR 2021


view repo

gpt_at_home

Distributed training of GPT-3 in a colaborative effort


view repo

Please sign up or login with your details

Forgot password? Click here to reset