Exploring Supervised and Unsupervised Rewards in Machine Translation

by   Julia Ive, et al.

Reinforcement Learning (RL) is a powerful framework to address the discrepancy between loss functions used during training and the final evaluation metrics to be used at test time. When applied to neural Machine Translation (MT), it minimises the mismatch between the cross-entropy loss and non-differentiable evaluation metrics like BLEU. However, the suitability of these metrics as reward function at training time is questionable: they tend to be sparse and biased towards the specific words used in the reference texts. We propose to address this problem by making models less reliant on such metrics in two ways: (a) with an entropy-regularised RL method that does not only maximise a reward function but also explore the action space to avoid peaky distributions; (b) with a novel RL method that explores a dynamic unsupervised reward function to balance between exploration and exploitation. We base our proposals on the Soft Actor-Critic (SAC) framework, adapting the off-policy maximum entropy model for language generation applications such as MT. We demonstrate that SAC with BLEU reward tends to overfit less to the training data and performs better on out-of-domain data. We also show that our dynamic unsupervised reward can lead to better translation of ambiguous words.



There are no comments yet.


page 1

page 2

page 3

page 4


Reward Learning for Efficient Reinforcement Learning in Extractive Document Summarisation

Document summarisation can be formulated as a sequential decision-making...

Neural Machine Translation with Adequacy-Oriented Learning

Although Neural Machine Translation (NMT) models have advanced state-of-...

On the Weaknesses of Reinforcement Learning for Neural Machine Translation

Reinforcement learning (RL) is frequently used to increase performance i...

Evaluating Rewards for Question Generation Models

Recent approaches to question generation have used modifications to a Se...

Neural Machine Translation with Monte-Carlo Tree Search

Recent algorithms in machine translation have included a value network t...

Improve the Evaluation of Fluency Using Entropy for Machine Translation Evaluation Metrics

The widely-used automatic evaluation metrics cannot adequately reflect t...

Differentiable lower bound for expected BLEU score

In natural language processing tasks performance of the models is often ...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.