RL with KL penalties is better viewed as Bayesian inference

05/23/2022
by   Tomasz Korbak, et al.
0

Reinforcement learning (RL) is frequently employed in fine-tuning large language models (LMs), such as GPT-3, to penalize them for undesirable features of generated sequences, such as offensiveness, social bias, harmfulness or falsehood. The RL formulation involves treating the LM as a policy and updating it to maximise the expected value of a reward function which captures human preferences, such as non-offensiveness. In this paper, we analyze challenges associated with treating a language model as an RL policy and show how avoiding those challenges requires moving beyond the RL paradigm. We start by observing that the standard RL approach is flawed as an objective for fine-tuning LMs because it leads to distribution collapse: turning the LM into a degenerate distribution. Then, we analyze KL-regularised RL, a widely used recipe for fine-tuning LMs, which additionally constrains the fine-tuned LM to stay close to its original distribution in terms of Kullback-Leibler (KL) divergence. We show that KL-regularised RL is equivalent to variational inference: approximating a Bayesian posterior which specifies how to update a prior LM to conform with evidence provided by the reward function. We argue that this Bayesian inference view of KL-regularised RL is more insightful than the typically employed RL perspective. The Bayesian inference view explains how KL-regularised RL avoids the distribution collapse problem and offers a first-principles derivation for its objective. While this objective happens to be equivalent to RL (with a particular choice of parametric reward), there exist other objectives for fine-tuning LMs which are no longer equivalent to RL. That observation leads to a more general point: RL is not an adequate formal framework for problems such as fine-tuning language models. These problems are best viewed as Bayesian inference: approximating a pre-defined target distribution.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/25/2023

DPOK: Reinforcement Learning for Fine-tuning Text-to-Image Diffusion Models

Learning from human feedback has been shown to improve text-to-image mod...
research
06/01/2022

On Reinforcement Learning and Distribution Matching for Fine-Tuning Language Models with no Catastrophic Forgetting

The availability of large pre-trained models is changing the landscape o...
research
11/09/2016

Sequence Tutor: Conservative Fine-Tuning of Sequence Generation Models with KL-control

This paper proposes a general method for improving the structure and qua...
research
02/16/2023

Aligning Language Models with Preferences through f-divergence Minimization

Aligning language models with preferences can be posed as approximating ...
research
05/16/2022

q-Munchausen Reinforcement Learning

The recently successful Munchausen Reinforcement Learning (M-RL) feature...
research
09/18/2019

Fine-Tuning Language Models from Human Preferences

Reward learning enables the application of reinforcement learning (RL) t...
research
02/26/2018

Principles of Bayesian Inference using General Divergence Criteria

When it is acknowledged that all candidate parameterised statistical mod...

Please sign up or login with your details

Forgot password? Click here to reset