Paying Attention to Multi-Word Expressions in Neural Machine Translation

10/17/2017
by   Matīss Rikters, et al.
0

Processing of multi-word expressions (MWEs) is a known problem for any natural language processing task. Even neural machine translation (NMT) struggles to overcome it. This paper presents results of experiments on investigating NMT attention allocation to the MWEs and improving automated translation of sentences that contain MWEs in English->Latvian and English->Czech NMT systems. Two improvement strategies were explored -(1) bilingual pairs of automatically extracted MWE candidates were added to the parallel corpus used to train the NMT system, and (2) full sentences containing the automatically extracted MWE candidates were added to the parallel corpus. Both approaches allowed to increase automated evaluation results. The best result - 0.99 BLEU point increase - has been reached with the first approach, while with the second approach minimal improvements achieved. We also provide open-source software and tools used for MWE extraction and alignment inspection.

READ FULL TEXT
research
01/19/2023

Improving Machine Translation with Phrase Pair Injection and Corpus Filtering

In this paper, we show that the combination of Phrase Pair Injection and...
research
02/13/2018

Examining the Tip of the Iceberg: A Data Set for Idiom Translation

Neural Machine Translation (NMT) has been widely used in recent years wi...
research
05/05/2023

Implications of Multi-Word Expressions on English to Bharti Braille Machine Translation

In this paper, we have shown the improvement of English to Bharti Braill...
research
10/17/2020

A Corpus for English-Japanese Multimodal Neural Machine Translation with Comparable Sentences

Multimodal neural machine translation (NMT) has become an increasingly i...
research
03/31/2021

Leveraging Neural Machine Translation for Word Alignment

The most common tools for word-alignment rely on a large amount of paral...
research
09/19/2018

NICT's Corpus Filtering Systems for the WMT18 Parallel Corpus Filtering Task

This paper presents the NICT's participation in the WMT18 shared paralle...
research
11/11/2019

Diversity by Phonetics and its Application in Neural Machine Translation

We introduce a powerful approach for Neural Machine Translation (NMT), w...

Please sign up or login with your details

Forgot password? Click here to reset