Controlling Conditional Language Models with Distributional Policy Gradients

12/01/2021
by   Tomasz Korbak, et al.
0

Machine learning is shifting towards general-purpose pretrained generative models, trained in a self-supervised manner on large amounts of data, which can then be applied to solve a large number of tasks. However, due to their generic training methodology, these models often fail to meet some of the downstream requirements (e.g. hallucination in abstractive summarization or wrong format in automatic code generation). This raises an important question on how to adapt pre-trained generative models to a new task without destroying its capabilities. Recent work has suggested to solve this problem by representing task-specific requirements through energy-based models (EBMs) and approximating these EBMs using distributional policy gradients (DPG). Unfortunately, this approach is limited to unconditional distributions, represented by unconditional EBMs. In this paper, we extend this approach to conditional tasks by proposing Conditional DPG (CDPG). We evaluate CDPG on three different control objectives across two tasks: summarization with T5 and code generation with GPT-Neo. Our results show that fine-tuning using CDPG robustly moves these pretrained models closer towards meeting control objectives and – in contrast with baseline approaches – does not result in catastrophic forgetting.

READ FULL TEXT
research
04/02/2023

Better Language Models of Code through Self-Improvement

Pre-trained language models for code (PLMCs) have gained attention in re...
research
02/16/2022

Probing Pretrained Models of Source Code

Deep learning models are widely used for solving challenging code proces...
research
03/08/2023

disco: a toolkit for Distributional Control of Generative Models

Pre-trained language models and other generative models have revolutioni...
research
10/16/2021

PRIMER: Pyramid-based Masked Sentence Pre-training for Multi-document Summarization

Recently proposed pre-trained generation models achieve strong performan...
research
11/13/2019

Can a Gorilla Ride a Camel? Learning Semantic Plausibility from Text

Modeling semantic plausibility requires commonsense knowledge about the ...
research
06/09/2021

Energy-Based Models for Code Generation under Compilability Constraints

Neural language models can be successfully trained on source code, leadi...
research
03/02/2022

Controlling the Focus of Pretrained Language Generation Models

The finetuning of pretrained transformer-based language generation model...

Please sign up or login with your details

Forgot password? Click here to reset