Domain-Specific Text Generation for Machine Translation

08/11/2022
by   Yasmin Moslem, et al.
6

Preservation of domain knowledge from the source to target is crucial in any translation workflow. It is common in the translation industry to receive highly specialized projects, where there is hardly any parallel in-domain data. In such scenarios where there is insufficient in-domain data to fine-tune Machine Translation (MT) models, producing translations that are consistent with the relevant context is challenging. In this work, we propose a novel approach to domain adaptation leveraging state-of-the-art pretrained language models (LMs) for domain-specific data augmentation for MT, simulating the domain characteristics of either (a) a small bilingual dataset, or (b) the monolingual source text to be translated. Combining this idea with back-translation, we can generate huge amounts of synthetic bilingual in-domain data for both use cases. For our investigation, we use the state-of-the-art Transformer architecture. We employ mixed fine-tuning to train models that significantly improve translation of in-domain texts. More specifically, in both scenarios, our proposed methods achieve improvements of approximately 5-6 BLEU and 2-3 BLEU, respectively, on the Arabic-to-English and English-to-Arabic language pairs. Furthermore, the outcome of human evaluation corroborates the automatic evaluation results.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/30/2023

Adaptive Machine Translation with Large Language Models

Consistency is a key requirement of high-quality translation. It is espe...
research
11/30/2020

Machine Translation of Novels in the Age of Transformer

In this chapter we build a machine translation (MT) system tailored to t...
research
01/14/2017

QCRI Machine Translation Systems for IWSLT 16

This paper describes QCRI's machine translation systems for the IWSLT 20...
research
05/28/2021

Investigating Code-Mixed Modern Standard Arabic-Egyptian to English Machine Translation

Recent progress in neural machine translation (NMT) has made it possible...
research
11/05/2020

Data Augmentation and Terminology Integration for Domain-Specific Sinhala-English-Tamil Statistical Machine Translation

Out of vocabulary (OOV) is a problem in the context of Machine Translati...
research
10/01/2020

Nearest Neighbor Machine Translation

We introduce k-nearest-neighbor machine translation (kNN-MT), which pred...
research
02/20/2021

Machine Translation Customization via Automatic Training Data Selection from the Web

Machine translation (MT) systems, especially when designed for an indust...

Please sign up or login with your details

Forgot password? Click here to reset