LIMA: Less Is More for Alignment

05/18/2023
by   Chunting Zhou, et al.
0

Large language models are trained in two stages: (1) unsupervised pretraining from raw text, to learn general-purpose representations, and (2) large scale instruction tuning and reinforcement learning, to better align to end tasks and user preferences. We measure the relative importance of these two stages by training LIMA, a 65B parameter LLaMa language model fine-tuned with the standard supervised loss on only 1,000 carefully curated prompts and responses, without any reinforcement learning or human preference modeling. LIMA demonstrates remarkably strong performance, learning to follow specific response formats from only a handful of examples in the training data, including complex queries that range from planning trip itineraries to speculating about alternate history. Moreover, the model tends to generalize well to unseen tasks that did not appear in the training data. In a controlled human study, responses from LIMA are either equivalent or strictly preferred to GPT-4 in 43 and 65 together, these results strongly suggest that almost all knowledge in large language models is learned during pretraining, and only limited instruction tuning data is necessary to teach models to produce high quality output.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/12/2023

Instruction Mining: High-Quality Instruction Data Selection for Large Language Models

Large language models typically undergo two training stages, pretraining...
research
04/14/2023

OpenAssistant Conversations – Democratizing Large Language Model Alignment

Aligning large language models (LLMs) with human preferences has proven ...
research
02/10/2023

The Wisdom of Hindsight Makes Language Models Better Instruction Followers

Reinforcement learning has seen wide success in finetuning large languag...
research
02/16/2023

Pretraining Language Models with Human Preferences

Language models (LMs) are pretrained to imitate internet text, including...
research
05/04/2023

Principle-Driven Self-Alignment of Language Models from Scratch with Minimal Human Supervision

Recent AI-assistant agents, such as ChatGPT, predominantly rely on super...
research
08/08/2023

Shepherd: A Critic for Language Model Generation

As large language models improve, there is increasing interest in techni...
research
08/22/2023

Towards an On-device Agent for Text Rewriting

Large Language Models (LLMs) have demonstrated impressive capabilities f...

Please sign up or login with your details

Forgot password? Click here to reset