Finding the Right Recipe for Low Resource Domain Adaptation in Neural Machine Translation

06/02/2022
by   Virginia Adams, et al.
7

General translation models often still struggle to generate accurate translations in specialized domains. To guide machine translation practitioners and characterize the effectiveness of domain adaptation methods under different data availability scenarios, we conduct an in-depth empirical exploration of monolingual and parallel data approaches to domain adaptation of pre-trained, third-party, NMT models in settings where architecture change is impractical. We compare data centric adaptation methods in isolation and combination. We study method effectiveness in very low resource (8k parallel examples) and moderately low resource (46k parallel examples) conditions and propose an ensemble approach to alleviate reductions in original domain translation quality. Our work includes three domains: consumer electronic, clinical, and biomedical and spans four language pairs - Zh-En, Ja-En, Es-En, and Ru-En. We also make concrete recommendations for achieving high in-domain performance and release our consumer electronic and medical domain datasets for all languages and make our code publicly available.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/29/2021

Synthesizing Monolingual Data for Neural Machine Translation

In neural machine translation (NMT), monolingual data in the target lang...
research
11/11/2022

Hardness-guided domain adaptation to recognise biomedical named entities under low-resource scenarios

Domain adaptation is an effective solution to data scarcity in low-resou...
research
07/06/2019

Exploiting Out-of-Domain Parallel Data through Multilingual Transfer Learning for Low-Resource Neural Machine Translation

This paper proposes a novel multilingual multistage fine-tuning approach...
research
10/23/2020

Rapid Domain Adaptation for Machine Translation with Monolingual Data

One challenge of machine translation is how to quickly adapt to unseen d...
research
04/30/2020

Addressing Zero-Resource Domains Using Document-Level Context in Neural Machine Translation

Achieving satisfying performance in machine translation on domains for w...
research
04/20/2022

DaLC: Domain Adaptation Learning Curve Prediction for Neural Machine Translation

Domain Adaptation (DA) of Neural Machine Translation (NMT) model often r...
research
12/20/2022

Localising In-Domain Adaptation of Transformer-Based Biomedical Language Models

In the era of digital healthcare, the huge volumes of textual informatio...

Please sign up or login with your details

Forgot password? Click here to reset