Exploring Supervised and Unsupervised Rewards in Machine Translation

02/22/2021
by   Julia Ive, et al.
0

Reinforcement Learning (RL) is a powerful framework to address the discrepancy between loss functions used during training and the final evaluation metrics to be used at test time. When applied to neural Machine Translation (MT), it minimises the mismatch between the cross-entropy loss and non-differentiable evaluation metrics like BLEU. However, the suitability of these metrics as reward function at training time is questionable: they tend to be sparse and biased towards the specific words used in the reference texts. We propose to address this problem by making models less reliant on such metrics in two ways: (a) with an entropy-regularised RL method that does not only maximise a reward function but also explore the action space to avoid peaky distributions; (b) with a novel RL method that explores a dynamic unsupervised reward function to balance between exploration and exploitation. We base our proposals on the Soft Actor-Critic (SAC) framework, adapting the off-policy maximum entropy model for language generation applications such as MT. We demonstrate that SAC with BLEU reward tends to overfit less to the training data and performs better on out-of-domain data. We also show that our dynamic unsupervised reward can lead to better translation of ambiguous words.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/30/2019

Reward Learning for Efficient Reinforcement Learning in Extractive Document Summarisation

Document summarisation can be formulated as a sequential decision-making...
research
03/16/2023

Learning Rewards to Optimize Global Performance Metrics in Deep Reinforcement Learning

When applying reinforcement learning (RL) to a new problem, reward engin...
research
11/21/2018

Neural Machine Translation with Adequacy-Oriented Learning

Although Neural Machine Translation (NMT) models have advanced state-of-...
research
02/28/2019

Evaluating Rewards for Question Generation Models

Recent approaches to question generation have used modifications to a Se...
research
04/27/2020

Neural Machine Translation with Monte-Carlo Tree Search

Recent algorithms in machine translation have included a value network t...
research
07/03/2019

On the Weaknesses of Reinforcement Learning for Neural Machine Translation

Reinforcement learning (RL) is frequently used to increase performance i...
research
11/24/2018

Connecting the Dots Between MLE and RL for Sequence Generation

Sequence generation models such as recurrent networks can be trained wit...

Please sign up or login with your details

Forgot password? Click here to reset