Automatic Evaluation and Analysis of Idioms in Neural Machine Translation

10/10/2022
by   Christos Baziotis, et al.
0

A major open problem in neural machine translation (NMT) is the translation of idiomatic expressions, such as "under the weather". The meaning of these expressions is not composed by the meaning of their constituent words, and NMT models tend to translate them literally (i.e., word-by-word), which leads to confusing and nonsensical translations. Research on idioms in NMT is limited and obstructed by the absence of automatic methods for quantifying these errors. In this work, first, we propose a novel metric for automatically measuring the frequency of literal translation errors without human involvement. Equipped with this metric, we present controlled translation experiments with models trained in different conditions (with/without the test-set idioms) and across a wide range of (global and targeted) metrics and test sets. We explore the role of monolingual pretraining and find that it yields substantial targeted improvements, even without observing any translation examples of the test-set idioms. In our analysis, we probe the role of idiom context. We find that the randomly initialized models are more local or "myopic" as they are relatively unaffected by variations of the idiom context, unlike the pretrained ones.

READ FULL TEXT

page 6

page 7

page 15

research
06/07/2016

Incorporating Discrete Translation Lexicons into Neural Machine Translation

Neural machine translation (NMT) often makes mistakes in translating low...
research
09/27/2019

On the use of BERT for Neural Machine Translation

Exploiting large pretrained models for various NMT tasks have gained a l...
research
10/01/2019

When and Why is Document-level Context Useful in Neural Machine Translation?

Document-level context has received lots of attention for compensating n...
research
02/13/2018

Examining the Tip of the Iceberg: A Data Set for Idiom Translation

Neural Machine Translation (NMT) has been widely used in recent years wi...
research
04/01/2019

Multimodal Machine Translation with Embedding Prediction

Multimodal machine translation is an attractive application of neural ma...
research
07/12/2021

Putting words into the system's mouth: A targeted attack on neural machine translation using monolingual data poisoning

Neural machine translation systems are known to be vulnerable to adversa...
research
11/03/2020

Detecting Word Sense Disambiguation Biases in Machine Translation for Model-Agnostic Adversarial Attacks

Word sense disambiguation is a well-known source of translation errors i...

Please sign up or login with your details

Forgot password? Click here to reset