Translationese as a Language in "Multilingual" NMT

11/10/2019
by   Parker Riley, et al.
0

Machine translation has an undesirable propensity to produce "translationese" artifacts, which can lead to higher BLEU scores while being liked less by human raters. Motivated by this, we model translationese and original (i.e. natural) text as separate languages in a multilingual model, and pose the question: can we perform zero-shot translation between original source text and original target text? There is no data with original source and original target, so we train sentence-level classifiers to distinguish translationese from original target text, and use this classifier to tag the training data for an NMT model. Using this technique we bias the model to produce more natural outputs at test time, yielding gains in human evaluation scores on both accuracy and fluency. Additionally, we demonstrate that it is possible to bias the model to produce translationese and game the BLEU score, increasing it while decreasing human-rated quality. We analyze these models using metrics to measure the degree of translationese in the output, and present an analysis of the capriciousness of heuristically-based train-data tagging.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/02/2021

Improving Zero-shot Multilingual Neural Machine Translation for Low-Resource Languages

Although the multilingual Neural Machine Translation(NMT), which extends...
research
09/10/2021

Improving Multilingual Translation by Representation and Gradient Regularization

Multilingual Neural Machine Translation (NMT) enables one model to serve...
research
04/23/2018

A neural interlingua for multilingual machine translation

We incorporate an explicit neural interlingua into a multilingual encode...
research
11/02/2020

Enabling Zero-shot Multilingual Spoken Language Translation with Language-Specific Encoders and Decoders

Current end-to-end approaches to Spoken Language Translation (SLT) rely ...
research
11/05/2019

Training Neural Machine Translation (NMT) Models using Tensor Train Decomposition on TensorFlow (T3F)

We implement a Tensor Train layer in the TensorFlow Neural Machine Trans...
research
06/02/2023

Multilingual Conceptual Coverage in Text-to-Image Models

We propose "Conceptual Coverage Across Languages" (CoCo-CroLa), a techni...
research
02/28/2019

Evaluating Rewards for Question Generation Models

Recent approaches to question generation have used modifications to a Se...

Please sign up or login with your details

Forgot password? Click here to reset