Diffusion Language Models Can Perform Many Tasks with Scaling and Instruction-Finetuning

08/23/2023
by   Jiasheng Ye, et al.
0

The recent surge of generative AI has been fueled by the generative power of diffusion probabilistic models and the scalable capabilities of large language models. Despite their potential, it remains elusive whether diffusion language models can solve general language tasks comparable to their autoregressive counterparts. This paper demonstrates that scaling diffusion models w.r.t. data, sizes, and tasks can effectively make them strong language learners. We build competent diffusion language models at scale by first acquiring knowledge from massive data via masked language modeling pretraining thanks to their intrinsic connections. We then reprogram pretrained masked language models into diffusion language models via diffusive adaptation, wherein task-specific finetuning and instruction finetuning are explored to unlock their versatility in solving general language tasks. Experiments show that scaling diffusion language models consistently improves performance across downstream language tasks. We further discover that instruction finetuning can elicit zero-shot and few-shot in-context learning abilities that help tackle many unseen tasks by following natural language instructions, and show promise in advanced and challenging abilities such as reasoning.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/03/2021

Finetuned Language Models Are Zero-Shot Learners

This paper explores a simple method for improving the zero-shot learning...
research
05/30/2023

Likelihood-Based Diffusion Language Models

Despite a growing interest in diffusion-based language models, existing ...
research
07/12/2023

Instruction Mining: High-Quality Instruction Data Selection for Large Language Models

Large language models typically undergo two training stages, pretraining...
research
05/17/2023

Statistical Knowledge Assessment for Generative Language Models

Generative Language Models (GLMs) have demonstrated capabilities to stor...
research
05/24/2023

Flan-MoE: Scaling Instruction-Finetuned Language Models with Sparse Mixture of Experts

The explosive growth of language models and their applications have led ...
research
05/21/2023

Automated Few-shot Classification with Instruction-Finetuned Language Models

A particularly successful class of approaches for few-shot learning comb...
research
02/09/2023

Toolformer: Language Models Can Teach Themselves to Use Tools

Language models (LMs) exhibit remarkable abilities to solve new tasks fr...

Please sign up or login with your details

Forgot password? Click here to reset