Why Do Pretrained Language Models Help in Downstream Tasks? An Analysis of Head and Prompt Tuning

06/17/2021
by   Colin Wei, et al.
9

Pretrained language models have achieved state-of-the-art performance when adapted to a downstream NLP task. However, theoretical analysis of these models is scarce and challenging since the pretraining and downstream tasks can be very different. We propose an analysis framework that links the pretraining and downstream tasks with an underlying latent variable generative model of text – the downstream classifier must recover a function of the posterior distribution over the latent variables. We analyze head tuning (learning a classifier on top of the frozen pretrained model) and prompt tuning in this setting. The generative model in our analysis is either a Hidden Markov Model (HMM) or an HMM augmented with a latent memory component, motivated by long-term dependencies in natural language. We show that 1) under certain non-degeneracy conditions on the HMM, simple classification heads can solve the downstream task, 2) prompt tuning obtains downstream guarantees with weaker non-degeneracy conditions, and 3) our recovery guarantees for the memory-augmented HMM are stronger than for the vanilla HMM because task-relevant information is easier to recover from the long-term memory. Experiments on synthetically generated data from HMMs back our theoretical findings.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/24/2022

NoisyTune: A Little Noise Can Help You Finetune Pretrained Language Models Better

Effectively finetuning pretrained language models (PLMs) is critical for...
research
10/07/2020

A Mathematical Exploration of Why Language Models Help Solve Downstream Tasks

Autoregressive language models pretrained on large corpora have been suc...
research
10/22/2022

PATS: Sensitivity-aware Noisy Learning for Pretrained Language Models

A wide range of NLP tasks benefit from the fine-tuning of pretrained lan...
research
04/19/2023

NetGPT: Generative Pretrained Transformer for Network Traffic

All data on the Internet are transferred by network traffic, thus accura...
research
06/04/2022

Instance-wise Prompt Tuning for Pretrained Language Models

Prompt Learning has recently gained great popularity in bridging the gap...
research
03/15/2022

Data Contamination: From Memorization to Exploitation

Pretrained language models are typically trained on massive web-based da...
research
10/30/2022

Parameter-Efficient Tuning Makes a Good Classification Head

In recent years, pretrained models revolutionized the paradigm of natura...

Please sign up or login with your details

Forgot password? Click here to reset