Transcending Scaling Laws with 0.1

10/20/2022
by   Yi Tay, et al.
2

Scaling language models improves performance but comes with significant computational costs. This paper proposes UL2R, a method that substantially improves existing language models and their scaling curves with a relatively tiny amount of extra compute. The key idea is to continue training a state-of-the-art large language model (e.g., PaLM) on a few more steps with UL2's mixture-of-denoiser objective. We show that, with almost negligible extra computational costs and no new sources of data, we are able to substantially improve the scaling properties of large language models on downstream metrics. In this paper, we continue training PaLM with UL2R, introducing a new set of models at 8B, 62B, and 540B scale which we call U-PaLM. Impressively, at 540B scale, we show an approximately 2x computational savings rate where U-PaLM achieves the same performance as the final PaLM 540B model at around half its computational budget (i.e., saving ∼4.4 million TPUv4 hours). We further show that this improved scaling curve leads to 'emergent abilities' on challenging BIG-Bench tasks – for instance, U-PaLM does much better than PaLM on some tasks or demonstrates better quality at much smaller scale (62B as opposed to 540B). Overall, we show that U-PaLM outperforms PaLM on many few-shot setups, i.e., English NLP tasks (e.g., commonsense reasoning, question answering), reasoning tasks with chain-of-thought (e.g., GSM8K), multilingual tasks (MGSM, TydiQA), MMLU and challenging BIG-Bench tasks. Finally, we provide qualitative examples showing the new capabilities of U-PaLM for single and multi-span infilling.

READ FULL TEXT
research
01/28/2022

Chain of Thought Prompting Elicits Reasoning in Large Language Models

Although scaling up language model size has reliably improved performanc...
research
12/13/2021

GLaM: Efficient Scaling of Language Models with Mixture-of-Experts

Scaling language models with more data, compute and parameters has drive...
research
04/05/2022

PaLM: Scaling Language Modeling with Pathways

Large language models have been shown to achieve remarkable performance ...
research
05/24/2023

How Predictable Are Large Language Model Capabilities? A Case Study on BIG-bench

We investigate the predictability of large language model (LLM) capabili...
research
07/20/2023

Invalid Logic, Equivalent Gains: The Bizarreness of Reasoning in Language Model Prompting

Language models can be prompted to reason through problems in a manner t...
research
01/30/2023

Specializing Smaller Language Models towards Multi-Step Reasoning

The surprising ability of Large Language Models (LLMs) to perform well o...
research
11/03/2022

Inverse scaling can become U-shaped

Although scaling language models improves performance on a range of task...

Please sign up or login with your details

Forgot password? Click here to reset