DeepAI AI Chat
Log In Sign Up

Using Mechanical Turk to Build Machine Translation Evaluation Sets

by   Michael Bloodgood, et al.
Johns Hopkins University

Building machine translation (MT) test sets is a relatively expensive task. As MT becomes increasingly desired for more and more language pairs and more and more domains, it becomes necessary to build test sets for each case. In this paper, we investigate using Amazon's Mechanical Turk (MTurk) to make MT test sets cheaply. We find that MTurk can be used to make test sets much cheaper than professionally-produced test sets. More importantly, in experiments with multiple MT systems, we find that the MTurk-produced test sets yield essentially the same conclusions regarding system performance as the professionally-produced test sets yield.


page 1

page 2

page 3

page 4


Variance-Aware Machine Translation Test Sets

We release 70 small and discriminative test sets for machine translation...

The Effect of Translationese in Machine Translation Test Sets

The effect of translationese has been studied in the field of machine tr...

DiaBLa: A Corpus of Bilingual Spontaneous Written Dialogues for Machine Translation

We present a new English-French test set for the evaluation of Machine T...

Bandits Don't Follow Rules: Balancing Multi-Facet Machine Translation with Multi-Armed Bandits

Training data for machine translation (MT) is often sourced from a multi...

Unsupervised Clustering of Commercial Domains for Adaptive Machine Translation

In this paper, we report on domain clustering in the ambit of an adaptiv...

Adaptive Mantel Test for Penalized Inference, with Applications to Imaging Genetics

Mantel's test (MT) for association is conducted by testing the linear re...

Original or Translated? A Causal Analysis of the Impact of Translationese on Machine Translation Performance

Human-translated text displays distinct features from naturally written ...