Tailoring Domain Adaptation for Machine Translation Quality Estimation

While quality estimation (QE) can play an important role in the translation process, its effectiveness relies on the availability and quality of training data. For QE in particular, high-quality labeled data is often lacking due to the high-cost and effort associated with labeling such data. Aside from the data scarcity challenge, QE models should also be generalizable, i.e., they should be able to handle data from different domains, both generic and specific. To alleviate these two main issues – data scarcity and domain mismatch – this paper combines domain adaptation and data augmentation within a robust QE system. Our method is to first train a generic QE model and then fine-tune it on a specific domain while retaining generic knowledge. Our results show a significant improvement for all the language pairs investigated, better cross-lingual inference, and a superior performance in zero-shot learning scenarios as compared to state-of-the-art baselines.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/23/2022

Zero-shot Domain Adaptation for Neural Machine Translation with Retrieved Phrase-level Prompts

Domain adaptation is an important challenge for neural machine translati...
research
09/09/2021

Generalised Unsupervised Domain Adaptation of Neural Machine Translation with Cross-Lingual Data Selection

This paper considers the unsupervised domain adaptation problem for neur...
research
11/30/2022

Domain Mismatch Doesn't Always Prevent Cross-Lingual Transfer Learning

Cross-lingual transfer learning without labeled target language data or ...
research
08/27/2019

Unsupervised Domain Adaptation for Neural Machine Translation with Domain-Aware Feature Embeddings

The recent success of neural machine translation models relies on the av...
research
10/26/2020

Exploiting Neural Query Translation into Cross Lingual Information Retrieval

As a crucial role in cross-language information retrieval (CLIR), query ...
research
04/05/2018

Domain Adaptation for Statistical Machine Translation

Statistical machine translation (SMT) systems perform poorly when it is ...
research
08/26/2020

Disentangled Representations for Domain-generalized Cardiac Segmentation

Robust cardiac image segmentation is still an open challenge due to the ...

Please sign up or login with your details

Forgot password? Click here to reset