DaLC: Domain Adaptation Learning Curve Prediction for Neural Machine Translation

04/20/2022
by   Cheonbok Park, et al.
0

Domain Adaptation (DA) of Neural Machine Translation (NMT) model often relies on a pre-trained general NMT model which is adapted to the new domain on a sample of in-domain parallel data. Without parallel data, there is no way to estimate the potential benefit of DA, nor the amount of parallel samples it would require. It is however a desirable functionality that could help MT practitioners to make an informed decision before investing resources in dataset creation. We propose a Domain adaptation Learning Curve prediction (DaLC) model that predicts prospective DA performance based on in-domain monolingual samples in the source language. Our model relies on the NMT encoder representations combined with various instance and corpus-level features. We demonstrate that instance-level is better able to distinguish between different domains compared to corpus-level frameworks proposed in previous studies. Finally, we perform in-depth analyses of the results highlighting the limitations of our approach, and provide directions for future research.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/01/2018

A Survey of Domain Adaptation for Neural Machine Translation

Neural machine translation (NMT) is a deep learning based approach for m...
research
01/12/2017

An Empirical Comparison of Simple Domain Adaptation Methods for Neural Machine Translation

In this paper, we propose a novel domain adaptation method named "mixed ...
research
06/02/2019

Domain Adaptation of Neural Machine Translation by Lexicon Induction

It has been previously noted that neural machine translation (NMT) is ve...
research
11/08/2022

What Knowledge Is Needed? Towards Explainable Memory for kNN-MT Domain Adaptation

kNN-MT presents a new paradigm for domain adaptation by building an exte...
research
06/02/2022

Finding the Right Recipe for Low Resource Domain Adaptation in Neural Machine Translation

General translation models often still struggle to generate accurate tra...
research
09/16/2021

Translation Transformers Rediscover Inherent Data Domains

Many works proposed methods to improve the performance of Neural Machine...
research
06/01/2022

IDANI: Inference-time Domain Adaptation via Neuron-level Interventions

Large pre-trained models are usually fine-tuned on downstream task data,...

Please sign up or login with your details

Forgot password? Click here to reset