Generating Training Data with Language Models: Towards Zero-Shot Language Understanding

02/09/2022
by   Yu Meng, et al.
0

Pretrained language models (PLMs) have demonstrated remarkable performance in various natural language processing tasks: Unidirectional PLMs (e.g., GPT) are well known for their superior text generation capabilities; bidirectional PLMs (e.g., BERT) have been the prominent choice for natural language understanding (NLU) tasks. While both types of models have achieved promising few-shot learning performance, their potential for zero-shot learning has been underexplored. In this paper, we present a simple approach that uses both types of PLMs for fully zero-shot learning of NLU tasks without requiring any task-specific data: A unidirectional PLM generates class-conditioned texts guided by prompts, which are used as the training data for fine-tuning a bidirectional PLM. With quality training data selected based on the generation probability and regularization techniques (label smoothing and temporal ensembling) applied to the fine-tuning stage for better generalization and stability, our approach demonstrates strong performance across seven classification tasks of the GLUE benchmark (e.g., 72.3/73.8 on MNLI-m/mm and 92.8 on SST-2), significantly outperforming zero-shot prompting methods and achieving even comparable results to strong few-shot approaches using 32 training samples per class.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/15/2021

FewCLUE: A Chinese Few-shot Learning Evaluation Benchmark

Pretrained Language Models (PLMs) have achieved tremendous success in na...
research
05/25/2022

ZeroGen^+: Self-Guided High-Quality Data Generation in Efficient Zero-Shot Learning

Nowadays, owing to the superior capacity of the large pre-trained langua...
research
11/06/2022

Tuning Language Models as Training Data Generators for Augmentation-Enhanced Few-Shot Learning

Recent studies have revealed the intriguing few-shot learning ability of...
research
05/23/2022

What Makes Data-to-Text Generation Hard for Pretrained Language Models?

Expressing natural language descriptions of structured facts or relation...
research
01/23/2022

An Application of Pseudo-Log-Likelihoods to Natural Language Scoring

Language models built using semi-supervised machine learning on large co...
research
06/24/2022

Unified BERT for Few-shot Natural Language Understanding

Even as pre-trained language models share a semantic encoder, natural la...
research
04/05/2022

PaLM: Scaling Language Modeling with Pathways

Large language models have been shown to achieve remarkable performance ...

Please sign up or login with your details

Forgot password? Click here to reset