Decoding Time Lexical Domain Adaptationfor Neural Machine Translation

01/02/2021
by   Nikolay Bogoychev, et al.
0

Machine translation systems are vulnerable to domain mismatch, especially when the task is low-resource. In this setting, out of domain translations are often of poor quality and prone to hallucinations, due to the translation model preferring to predict common words it has seen during training, as opposed to the more uncommon ones from a different domain. We present two simple methods for improving translation quality in this particular setting: First, we use lexical shortlisting in order to restrict the neural network predictions by IBM model computed alignments. Second, we perform n-best list reordering by reranking all translations based on the amount they overlap with each other. Our methods are computationally simpler and faster than alternative approaches, and show a moderate success on low-resource settings with explicit out of domain test sets. However, our methods lose their effectiveness when the domain mismatch is too great, or in high resource setting.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/01/2022

Exploring Diversity in Back Translation for Low-Resource Machine Translation

Back translation is one of the most widely used methods for improving th...
research
03/20/2021

The Effectiveness of Morphology-aware Segmentation in Low-Resource Neural Machine Translation

This paper evaluates the performance of several modern subword segmentat...
research
11/08/2019

Domain Robustness in Neural Machine Translation

Translating text that diverges from the training domain is a key challen...
research
04/05/2023

Unleashing the Power of ChatGPT for Translation: An Empirical Study

The recently released ChatGPT has demonstrated surprising abilities in n...
research
11/12/2021

BitextEdit: Automatic Bitext Editing for Improved Low-Resource Machine Translation

Mined bitexts can contain imperfect translations that yield unreliable t...
research
06/07/2021

Lexicon Learning for Few-Shot Neural Sequence Modeling

Sequence-to-sequence transduction is the core problem in language proces...
research
10/06/2021

The Low-Resource Double Bind: An Empirical Study of Pruning for Low-Resource Machine Translation

A "bigger is better" explosion in the number of parameters in deep neura...

Please sign up or login with your details

Forgot password? Click here to reset