GROOT: Corrective Reward Optimization for Generative Sequential Labeling

09/29/2022
by   Kazuma Hashimoto, et al.
0

Sequential labeling is a fundamental NLP task, forming the backbone of many applications. Supervised learning of Seq2Seq models (like T5) has shown great success on these problems. However there remains a significant disconnect between the training objectives of these models vs the metrics and desiderata we care about in practical applications. For example, a practical sequence tagging application may want to optimize for a certain precision-recall trade-off (of the top-k predictions) which is quite different from the standard objective of maximizing the likelihood of the gold labeled sequence. Thus to bridge this gap, we propose GROOT – a simple yet effective framework for Generative Reward Optimization Of Text sequences. GROOT works by training a generative sequential labeling model to match the decoder output distribution with that of the (black-box) reward function. Using an iterative training regime, we first generate prediction candidates, then correct errors in them, and finally contrast those candidates (based on their reward values). As demonstrated via extensive experiments on four public benchmarks, GROOT significantly improves all reward metrics. Furthermore, GROOT also leads to improvements of the overall decoder distribution as evidenced by the quality gains of the top-k candidates.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/08/2021

Flow Network based Generative Models for Non-Iterative Diverse Candidate Generation

This paper is about the problem of learning a stochastic policy for gene...
research
12/21/2022

How Does Beam Search improve Span-Level Confidence Estimation in Generative Sequence Labeling?

Text-to-text generation models have increasingly become the go-to soluti...
research
06/01/2023

Extracting Reward Functions from Diffusion Models

Diffusion models have achieved remarkable results in image generation, a...
research
11/01/2022

Consistent Training via Energy-Based GFlowNets for Modeling Discrete Joint Distributions

Generative Flow Networks (GFlowNets) have demonstrated significant perfo...
research
12/22/2018

Search-Guided, Lightly-supervised Training of Structured Prediction Energy Networks

In structured output prediction tasks, labeling ground-truth training ou...
research
03/24/2023

Personalizing Task-oriented Dialog Systems via Zero-shot Generalizable Reward Function

Task-oriented dialog systems enable users to accomplish tasks using natu...
research
05/19/2017

Softmax Q-Distribution Estimation for Structured Prediction: A Theoretical Interpretation for RAML

Reward augmented maximum likelihood (RAML), a simple and effective learn...

Please sign up or login with your details

Forgot password? Click here to reset