DeepAI AI Chat
Log In Sign Up

Make The Most of Prior Data: A Solution for Interactive Text Summarization with Preference Feedback

by   Duy Hung Nguyen, et al.
Deakin University

For summarization, human preference is critical to tame outputs of the summarizer in favor of human interests, as ground-truth summaries are scarce and ambiguous. Practical settings require dynamic exchanges between human and AI agent wherein feedback is provided in an online manner, a few at a time. In this paper, we introduce a new framework to train summarization models with preference feedback interactively. By properly leveraging offline data and a novel reward model, we improve the performance regarding ROUGE scores and sample-efficiency. Our experiments on three various datasets confirm the benefit of the proposed framework in active, few-shot and online settings of preference learning.


page 1

page 2

page 3

page 4


Improving Factuality of Abstractive Summarization via Contrastive Reward Learning

Modern abstractive summarization models often generate summaries that co...

Hone as You Read: A Practical Type of Interactive Summarization

We present HARE, a new task where reader feedback is used to optimize do...

Constitutional AI: Harmlessness from AI Feedback

As AI systems become more capable, we would like to enlist their help to...

RLAIF: Scaling Reinforcement Learning from Human Feedback with AI Feedback

Reinforcement learning from human feedback (RLHF) is effective at aligni...

Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback

We apply preference modeling and reinforcement learning from human feedb...

A Machine Learning Approach for Predicting Human Preference for Graph Layouts

Understanding what graph layout human prefer and why they prefer is sign...

Active Reward Learning from Online Preferences

Robot policies need to adapt to human preferences and/or new environment...