Facilitating Terminology Translation with Target Lemma Annotations

01/25/2021
by   Toms Bergmanis, et al.
0

Most of the recent work on terminology integration in machine translation has assumed that terminology translations are given already inflected in forms that are suitable for the target language sentence. In day-to-day work of professional translators, however, it is seldom the case as translators work with bilingual glossaries where terms are given in their dictionary forms; finding the right target language form is part of the translation process. We argue that the requirement for apriori specified target language forms is unrealistic and impedes the practical applicability of previous work. In this work, we propose to train machine translation systems using a source-side data augmentation method that annotates randomly selected source language words with their target language lemmas. We show that systems trained on such augmented data are readily usable for terminology integration in real-life translation scenarios. Our experiments on terminology translation into the morphologically complex Baltic and Uralic languages show an improvement of up to 7 BLEU points over baseline systems with no means for terminology integration and an average improvement of 4 BLEU points over the previous work. Results of the human evaluation indicate a 47.7 translation accuracy when translating into Latvian.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/05/2020

Data Augmentation and Terminology Integration for Domain-Specific Sinhala-English-Tamil Statistical Machine Translation

Out of vocabulary (OOV) is a problem in the context of Machine Translati...
research
10/16/2018

Multi-Source Neural Machine Translation with Data Augmentation

Multi-source translation systems translate from multiple languages to a ...
research
12/07/2016

Improving the Performance of Neural Machine Translation Involving Morphologically Rich Languages

The advent of the attention mechanism in neural machine translation mode...
research
08/14/2019

On The Evaluation of Machine Translation Systems Trained With Back-Translation

Back-translation is a widely used data augmentation technique which leve...
research
09/13/2023

Mitigating Hallucinations and Off-target Machine Translation with Source-Contrastive and Language-Contrastive Decoding

Hallucinations and off-target translation remain unsolved problems in ma...
research
07/13/2023

Data Augmentation for Machine Translation via Dependency Subtree Swapping

We present a generic framework for data augmentation via dependency subt...
research
10/10/2022

Improving Retrieval Augmented Neural Machine Translation by Controlling Source and Fuzzy-Match Interactions

We explore zero-shot adaptation, where a general-domain model has access...

Please sign up or login with your details

Forgot password? Click here to reset