Robust Tuning Datasets for Statistical Machine Translation

10/01/2017
by   Preslav Nakov, et al.
0

We explore the idea of automatically crafting a tuning dataset for Statistical Machine Translation (SMT) that makes the hyper-parameters of the SMT system more robust with respect to some specific deficiencies of the parameter tuning algorithms. This is an under-explored research direction, which can allow better parameter tuning. In this paper, we achieve this goal by selecting a subset of the available sentence pairs, which are more suitable for specific combinations of optimizers, objective functions, and evaluation measures. We demonstrate the potential of the idea with the pairwise ranking optimization (PRO) optimizer, which is known to yield too short translations. We show that the learning problem can be alleviated by tuning on a subset of the development set, selected based on sentence length. In particular, using the longest 50 and improvements in BLEU score that rival those of alternatives, which fix BLEU+1's smoothing instead.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/24/2016

Reordering rules for English-Hindi SMT

Reordering is a preprocessing stage for Statistical Machine Translation ...
research
01/02/2023

Statistical Machine Translation for Indic Languages

Machine Translation (MT) system generally aims at automatic representati...
research
09/30/2015

Enhanced Bilingual Evaluation Understudy

Our research extends the Bilingual Evaluation Understudy (BLEU) evaluati...
research
07/28/2017

Counterfactual Learning from Bandit Feedback under Deterministic Logging: A Case Study in Statistical Machine Translation

The goal of counterfactual learning for statistical machine translation ...
research
04/18/2016

Speed-Constrained Tuning for Statistical Machine Translation Using Bayesian Optimization

We address the problem of automatically finding the parameters of a stat...
research
01/18/2016

Bandit Structured Prediction for Learning from Partial Feedback in Statistical Machine Translation

We present an approach to structured prediction from bandit feedback, ca...
research
04/26/2019

AlphaClean: Automatic Generation of Data Cleaning Pipelines

The analyst effort in data cleaning is gradually shifting away from the ...

Please sign up or login with your details

Forgot password? Click here to reset