Testing Machine Translation via Referential Transparency

04/22/2020
by   Pinjia He, et al.
0

Machine translation software has seen rapid progress in recent years due to the advancement of deep neural networks. People routinely use machine translation software in their daily lives, such as ordering food in a foreign restaurant, receiving medical diagnosis and treatment from foreign doctors, and reading international political news online. However, due to the complexity and intractability of the underlying neural networks, modern machine translation software is still far from robust. To address this problem, we introduce referentially transparent inputs (RTIs), a simple, widely applicable methodology for validating machine translation software. A referentially transparent input is a piece of text that should have invariant translation when used in different contexts. Our practical implementation, Purity, detects when this invariance property is broken by a translation. To evaluate RTI, we use Purity to test Google Translate and Bing Microsoft Translator with 200 unlabeled sentences, which led to 123 and 142 erroneous translations with high precision (79.3% and 78.3%). The translation errors are diverse, including under-translation, over-translation, word/phrase mistranslation, incorrect modification, and unclear logic. These translation errors could lead to misunderstanding, financial loss, threats to personal safety and health, and political conflicts.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/19/2019

Structure-Invariant Testing for Machine Translation

In recent years, machine translation software has increasingly been inte...
research
03/15/2018

Achieving Human Parity on Automatic Chinese to English News Translation

Machine translation has made rapid advances in recent years. Millions of...
research
03/20/2023

Translate your gibberish: black-box adversarial attack on machine translation systems

Neural networks are deployed widely in natural language processing tasks...
research
10/05/2020

We Don't Speak the Same Language: Interpreting Polarization through Machine Translation

Polarization among US political parties, media and elites is a widely st...
research
06/07/2018

A Challenge Set for French --> English Machine Translation

We present a challenge set for French --> English machine translation ba...
research
06/23/2015

New Approach to translation of Isolated Units in English-Korean Machine Translation

It is the most effective way for quick translation of tremendous amount ...
research
11/02/2020

The 2020s Political Economy of Machine Translation

This paper explores the hypothesis that the diversity of human languages...

Please sign up or login with your details

Forgot password? Click here to reset