Gradient Ascent Post-training Enhances Language Model Generalization

06/12/2023
by   Dongkeun Yoon, et al.
0

In this work, we empirically show that updating pretrained LMs (350M, 1.3B, 2.7B) with just a few steps of Gradient Ascent Post-training (GAP) on random, unlabeled text corpora enhances its zero-shot generalization capabilities across diverse NLP tasks. Specifically, we show that GAP can allow LMs to become comparable to 2-3x times larger LMs across 12 different NLP tasks. We also show that applying GAP on out-of-distribution corpora leads to the most reliable performance improvements. Our findings indicate that GAP can be a promising method for improving the generalization capability of LMs without any task-specific fine-tuning.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/20/2022

Parameter-efficient Zero-shot Transfer for Cross-Language Dense Retrieval with Adapters

A popular approach to creating a zero-shot cross-language retrieval mode...
research
05/28/2020

Language Models are Few-Shot Learners

Recent work has demonstrated substantial gains on many NLP tasks and ben...
research
04/29/2022

Prompt Consistency for Zero-Shot Task Generalization

One of the most impressive results of recent NLP history is the ability ...
research
09/13/2023

How (Not) to Use Sociodemographic Information for Subjective NLP Tasks

Annotators' sociodemographic backgrounds (i.e., the individual compositi...
research
11/15/2022

On the Compositional Generalization Gap of In-Context Learning

Pretrained large generative language models have shown great performance...
research
05/11/2023

KGA: A General Machine Unlearning Framework Based on Knowledge Gap Alignment

Recent legislation of the "right to be forgotten" has led to the interes...
research
10/15/2021

Exploring Low-dimensional Intrinsic Task Subspace via Prompt Tuning

How can pre-trained language models (PLMs) learn universal representatio...

Please sign up or login with your details

Forgot password? Click here to reset