FDMT: A Benchmark Dataset for Fine-grained Domain Adaptation in Machine Translation

12/31/2020
by   Wenhao Zhu, et al.
0

Previous domain adaptation research usually neglect the diversity in translation within a same domain, which is a core problem for adapting a general neural machine translation (NMT) model into a specific domain in real-world scenarios. One representative of such challenging scenarios is to deploy a translation system for a conference with a specific topic, e.g. computer networks or natural language processing, where there is usually extremely less resources due to the limited time schedule. To motivate a wide investigation in such settings, we present a real-world fine-grained domain adaptation task in machine translation (FDMT). The FDMT dataset (Zh-En) consists of four sub-domains of information technology: autonomous vehicles, AI education, real-time networks and smart phone. To be closer to reality, FDMT does not employ any in-domain bilingual training data. Instead, each sub-domain is equipped with monolingual data, bilingual dictionary and knowledge base, to encourage in-depth exploration of these available resources. Corresponding development set and test set are provided for evaluation purpose. We make quantitative experiments and deep analyses in this new setting, which benchmarks the fine-grained domain adaptation task and reveals several challenging problems that need to be addressed.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/01/2018

A Survey of Domain Adaptation for Neural Machine Translation

Neural machine translation (NMT) is a deep learning based approach for m...
research
04/14/2021

Domain Adaptation and Multi-Domain Adaptation for Neural Machine Translation: A Survey

The development of deep learning techniques has allowed Neural Machine T...
research
12/20/2016

Fast Domain Adaptation for Neural Machine Translation

Neural Machine Translation (NMT) is a new approach for automatic transla...
research
10/13/2022

M2D2: A Massively Multi-domain Language Modeling Dataset

We present M2D2, a fine-grained, massively multi-domain corpus for study...
research
02/21/2022

Domain Adaptation in Neural Machine Translation using a Qualia-Enriched FrameNet

In this paper we present Scylla, a methodology for domain adaptation of ...
research
02/22/2020

Machine Translation System Selection from Bandit Feedback

Adapting machine translation systems in the real world is a difficult pr...
research
08/31/2017

Identifying Products in Online Cybercrime Marketplaces: A Dataset for Fine-grained Domain Adaptation

One weakness of machine-learned NLP models is that they typically perfor...

Please sign up or login with your details

Forgot password? Click here to reset