Evaluating Rewards for Question Generation Models

02/28/2019
by   Tom Hosking, et al.
0

Recent approaches to question generation have used modifications to a Seq2Seq architecture inspired by advances in machine translation. Models are trained using teacher forcing to optimise only the one-step-ahead prediction. However, at test time, the model is asked to generate a whole sequence, causing errors to propagate through the generation process (exposure bias). A number of authors have proposed countering this bias by optimising for a reward that is less tightly coupled to the training data, using reinforcement learning. We optimise directly for quality metrics, including a novel approach using a discriminator learned directly from the training data. We confirm that policy gradient methods can be used to decouple training from the ground truth, leading to increases in the metrics used as rewards. We perform a human evaluation, and show that although these metrics have previously been assumed to be good proxies for question quality, they are poorly aligned with human judgement and the model simply learns to exploit the weaknesses of the reward source.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/02/2020

Exploring Question-Specific Rewards for Generating Deep Questions

Recent question generation (QG) approaches often utilize the sequence-to...
research
10/07/2020

TeaForN: Teacher-Forcing with N-grams

Sequence generation models trained with teacher-forcing suffer from issu...
research
02/22/2021

Exploring Supervised and Unsupervised Rewards in Machine Translation

Reinforcement Learning (RL) is a powerful framework to address the discr...
research
09/16/2020

Text Generation by Learning from Off-Policy Demonstrations

Current approaches to text generation largely rely on autoregressive mod...
research
12/02/2016

Self-critical Sequence Training for Image Captioning

Recently it has been shown that policy-gradient methods for reinforcemen...
research
10/01/2019

Generalization in Generation: A closer look at Exposure Bias

Exposure bias refers to the train-test discrepancy that seemingly arises...
research
11/10/2019

Translationese as a Language in "Multilingual" NMT

Machine translation has an undesirable propensity to produce "translatio...

Please sign up or login with your details

Forgot password? Click here to reset