Simplify-then-Translate: Automatic Preprocessing for Black-Box Machine Translation

05/22/2020
by   Sneha Mehta, et al.
0

Black-box machine translation systems have proven incredibly useful for a variety of applications yet by design are hard to adapt, tune to a specific domain, or build on top of. In this work, we introduce a method to improve such systems via automatic pre-processing (APP) using sentence simplification. We first propose a method to automatically generate a large in-domain paraphrase corpus through back-translation with a black-box MT system, which is used to train a paraphrase model that "simplifies" the original sentence to be more conducive for translation. The model is used to preprocess source sentences of multiple low-resource language pairs. We show that this preprocessing leads to better translation performance as compared to non-preprocessed source sentences. We further perform side-by-side human evaluation to verify that translations of the simplified sentences are better than the original ones. Finally, we provide some guidance on recommended language pairs for generating the simplification model corpora by investigating the relationship between ease of translation of a language pair (as measured by BLEU) and quality of the resulting simplification model from back-translations of this language pair (as measured by SARI), and tie this into the downstream task of low-resource translation.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/12/2021

BitextEdit: Automatic Bitext Editing for Improved Low-Resource Machine Translation

Mined bitexts can contain imperfect translations that yield unreliable t...
research
09/26/2017

Improving a Multi-Source Neural Machine Translation Model with Corpus Extension for Low-Resource Languages

In machine translation, we often try to collect resources to improve its...
research
10/07/2019

Automatic Testing and Improvement of Machine Translation

This paper presents TransRepair, a fully automatic approach for testing ...
research
04/30/2020

Imitation Attacks and Defenses for Black-box Machine Translation Systems

We consider an adversary looking to steal or attack a black-box machine ...
research
03/08/2019

Filling Gender & Number Gaps in Neural Machine Translation with Black-box Context Injection

When translating from a language that does not morphologically mark info...
research
09/15/2021

Beyond Glass-Box Features: Uncertainty Quantification Enhanced Quality Estimation for Neural Machine Translation

Quality Estimation (QE) plays an essential role in applications of Machi...
research
08/26/2019

uniblock: Scoring and Filtering Corpus with Unicode Block Information

The preprocessing pipelines in Natural Language Processing usually invol...

Please sign up or login with your details

Forgot password? Click here to reset