Efficient Machine Translation Corpus Generation

06/20/2023
by   Kamer Ali Yuksel, et al.
0

This paper proposes an efficient and semi-automated method for human-in-the-loop post-editing for machine translation (MT) corpus generation. The method is based on online training of a custom MT quality estimation metric on-the-fly as linguists perform post-edits. The online estimator is used to prioritize worse hypotheses for post-editing, and auto-close best hypotheses without post-editing. This way, significant improvements can be achieved in the resulting quality of post-edits at a lower cost due to reduced human involvement. The trained estimator can also provide an online sanity check mechanism for post-edits and remove the need for additional linguists to review them or work on the same hypotheses. In this paper, the effect of prioritizing with the proposed method on the resulting MT corpus quality is presented versus scheduling hypotheses randomly. As demonstrated by experiments, the proposed method improves the lifecycle of MT models by focusing the linguist effort on production samples and hypotheses, which matter most for expanding MT corpora to be used for re-training them.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/09/2020

MLQE-PE: A Multilingual Quality Estimation and Post-Editing Dataset

We present MLQE-PE, a new dataset for Machine Translation (MT) Quality E...
research
05/02/2020

Practical Perspectives on Quality Estimation for Machine Translation

Sentence level quality estimation (QE) for machine translation (MT) atte...
research
09/17/2021

The JHU-Microsoft Submission for WMT21 Quality Estimation Shared Task

This paper presents the JHU-Microsoft joint submission for WMT 2021 qual...
research
11/07/2020

AlphaMWE: Construction of Multilingual Parallel Corpora with MWE Annotations

In this work, we present the construction of multilingual parallel corpo...
research
06/20/2023

EvolveMT: an Ensemble MT Engine Improving Itself with Usage Only

This paper presents EvolveMT for efficiently combining multiple machine ...
research
06/14/2019

A Simple and Effective Approach to Automatic Post-Editing with Transfer Learning

Automatic post-editing (APE) seeks to automatically refine the output of...
research
07/01/2019

Post-editese: an Exacerbated Translationese

Post-editing (PE) machine translation (MT) is widely used for disseminat...

Please sign up or login with your details

Forgot password? Click here to reset