DeepAI AI Chat
Log In Sign Up

FDMT: A Benchmark Dataset for Fine-grained Domain Adaptation in Machine Translation

12/31/2020
by   Wenhao Zhu, et al.
0

Previous domain adaptation research usually neglect the diversity in translation within a same domain, which is a core problem for adapting a general neural machine translation (NMT) model into a specific domain in real-world scenarios. One representative of such challenging scenarios is to deploy a translation system for a conference with a specific topic, e.g. computer networks or natural language processing, where there is usually extremely less resources due to the limited time schedule. To motivate a wide investigation in such settings, we present a real-world fine-grained domain adaptation task in machine translation (FDMT). The FDMT dataset (Zh-En) consists of four sub-domains of information technology: autonomous vehicles, AI education, real-time networks and smart phone. To be closer to reality, FDMT does not employ any in-domain bilingual training data. Instead, each sub-domain is equipped with monolingual data, bilingual dictionary and knowledge base, to encourage in-depth exploration of these available resources. Corresponding development set and test set are provided for evaluation purpose. We make quantitative experiments and deep analyses in this new setting, which benchmarks the fine-grained domain adaptation task and reveals several challenging problems that need to be addressed.

READ FULL TEXT

page 1

page 2

page 3

page 4

06/01/2018

A Survey of Domain Adaptation for Neural Machine Translation

Neural machine translation (NMT) is a deep learning based approach for m...
04/14/2021

Domain Adaptation and Multi-Domain Adaptation for Neural Machine Translation: A Survey

The development of deep learning techniques has allowed Neural Machine T...
12/20/2016

Fast Domain Adaptation for Neural Machine Translation

Neural Machine Translation (NMT) is a new approach for automatic transla...
10/13/2022

M2D2: A Massively Multi-domain Language Modeling Dataset

We present M2D2, a fine-grained, massively multi-domain corpus for study...
02/21/2022

Domain Adaptation in Neural Machine Translation using a Qualia-Enriched FrameNet

In this paper we present Scylla, a methodology for domain adaptation of ...
02/22/2020

Machine Translation System Selection from Bandit Feedback

Adapting machine translation systems in the real world is a difficult pr...
08/31/2017

Identifying Products in Online Cybercrime Marketplaces: A Dataset for Fine-grained Domain Adaptation

One weakness of machine-learned NLP models is that they typically perfor...