Bandit Structured Prediction for Learning from Partial Feedback in Statistical Machine Translation

01/18/2016
by   Artem Sokolov, et al.
0

We present an approach to structured prediction from bandit feedback, called Bandit Structured Prediction, where only the value of a task loss function at a single predicted point, instead of a correct structure, is observed in learning. We present an application to discriminative reranking in Statistical Machine Translation (SMT) where the learning algorithm only has access to a 1-BLEU loss evaluation of a predicted translation instead of obtaining a gold standard reference translation. In our experiment bandit feedback is obtained by evaluating BLEU on reference translations without revealing them to the algorithm. This can be thought of as a simulation of interactive machine translation where an SMT system is personalized by a user who provides single point feedback to predicted translations. Our experiments show that our approach improves translation quality and is comparable to approaches that employ more informative feedback in learning.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/28/2017

Counterfactual Learning from Bandit Feedback under Deterministic Logging: A Case Study in Statistical Machine Translation

The goal of counterfactual learning for statistical machine translation ...
research
04/21/2017

Bandit Structured Prediction for Neural Sequence-to-Sequence Learning

Bandit structured prediction describes a stochastic optimization framewo...
research
05/03/2018

A Reinforcement Learning Approach to Interactive-Predictive Neural Machine Translation

We present an approach to interactive-predictive neural machine translat...
research
06/02/2016

Stochastic Structured Prediction under Bandit Feedback

Stochastic structured prediction under bandit feedback follows a learnin...
research
02/22/2020

Machine Translation System Selection from Bandit Feedback

Adapting machine translation systems in the real world is a difficult pr...
research
05/27/2018

Reliability and Learnability of Human Bandit Feedback for Sequence-to-Sequence Reinforcement Learning

We present a study on reinforcement learning (RL) from human bandit feed...
research
10/01/2017

Robust Tuning Datasets for Statistical Machine Translation

We explore the idea of automatically crafting a tuning dataset for Stati...

Please sign up or login with your details

Forgot password? Click here to reset