RewardsOfSum: Exploring Reinforcement Learning Rewards for Summarisation

06/08/2021
by   Jacob Parnell, et al.
0

To date, most abstractive summarisation models have relied on variants of the negative log-likelihood (NLL) as their training objective. In some cases, reinforcement learning has been added to train the models with an objective that is closer to their evaluation measures (e.g. ROUGE). However, the reward function to be used within the reinforcement learning approach can play a key role for performance and is still partially unexplored. For this reason, in this paper, we propose two reward functions for the task of abstractive summarisation: the first function, referred to as RwB-Hinge, dynamically selects the samples for the gradient update. The second function, nicknamed RISK, leverages a small pool of strong candidates to inform the reward. In the experiments, we probe the proposed approach by fine-tuning an NLL pre trained model over nine summarisation datasets of diverse size and nature. The experimental results show a consistent improvement over the negative log-likelihood baselines.

READ FULL TEXT
research
02/05/2021

Deceptive Reinforcement Learning for Privacy-Preserving Planning

In this paper, we study the problem of deceptive reinforcement learning ...
research
06/09/2020

Super-resolution Variational Auto-Encoders

The framework of variational autoencoders (VAEs) provides a principled m...
research
05/23/2017

Reinforcement Learning with a Corrupted Reward Channel

No real-world reward function is perfect. Sensory errors and software bu...
research
05/25/2021

A Comparison of Reward Functions in Q-Learning Applied to a Cart Position Problem

Growing advancements in reinforcement learning has led to advancements i...
research
02/03/2023

Learning Zero-Shot Cooperation with Humans, Assuming Humans Are Biased

There is a recent trend of applying multi-agent reinforcement learning (...
research
06/18/2016

On Reward Function for Survival

Obtaining a survival strategy (policy) is one of the fundamental problem...
research
07/13/2023

Reward-Directed Conditional Diffusion: Provable Distribution Estimation and Reward Improvement

We explore the methodology and theory of reward-directed generation via ...

Please sign up or login with your details

Forgot password? Click here to reset