MAD for Robust Reinforcement Learning in Machine Translation

07/18/2022
by   Domenic Donato, et al.
0

We introduce a new distributed policy gradient algorithm and show that it outperforms existing reward-aware training procedures such as REINFORCE, minimum risk training (MRT) and proximal policy optimization (PPO) in terms of training stability and generalization performance when optimizing machine translation models. Our algorithm, which we call MAD (on account of using the mean absolute deviation in the importance weighting calculation), has distributed data generators sampling multiple candidates per source sentence on worker nodes, while a central learner updates the policy. MAD depends crucially on two variance reduction strategies: (1) a conditional reward normalization method that ensures each source sentence has both positive and negative reward translation examples and (2) a new robust importance weighting scheme that acts as a conditional entropy regularizer. Experiments on a variety of translation tasks show that policies learned using the MAD algorithm perform very well when using both greedy decoding and beam search, and that the learned policies are sensitive to the specific reward used during training.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/21/2023

LEAPT: Learning Adaptive Prefix-to-prefix Translation For Simultaneous Machine Translation

Simultaneous machine translation, which aims at a real-time translation,...
research
12/13/2022

Scalable and Sample Efficient Distributed Policy Gradient Algorithms in Multi-Agent Networked Systems

This paper studies a class of multi-agent reinforcement learning (MARL) ...
research
05/18/2020

Entropy-Augmented Entropy-Regularized Reinforcement Learning and a Continuous Path from Policy Gradient to Q-Learning

Entropy augmented to reward is known to soften the greedy argmax policy ...
research
05/25/2018

Learning Self-Imitating Diverse Policies

Deep reinforcement learning algorithms, including policy gradient method...
research
02/08/2017

Trainable Greedy Decoding for Neural Machine Translation

Recent research in neural machine translation has largely focused on two...
research
02/11/2020

Learning Coupled Policies for Simultaneous Machine Translation

In simultaneous machine translation, the system needs to incrementally g...

Please sign up or login with your details

Forgot password? Click here to reset