Domain Adaptation of Machine Translation with Crowdworkers

10/28/2022
by   Makoto Morishita, et al.
0

Although a machine translation model trained with a large in-domain parallel corpus achieves remarkable results, it still works poorly when no in-domain data are available. This situation restricts the applicability of machine translation when the target domain's data are limited. However, there is great demand for high-quality domain-specific machine translation models for many domains. We propose a framework that efficiently and effectively collects parallel sentences in a target domain from the web with the help of crowdworkers. With the collected parallel data, we can quickly adapt a machine translation model to the target domain. Our experiments show that the proposed method can collect target-domain parallel data over a few days at a reasonable cost. We tested it with five domains, and the domain-adapted model improved the BLEU scores to +19.7 by an average of +7.8 points compared to a general-purpose translation model.

READ FULL TEXT

page 6

page 10

page 11

page 12

research
04/30/2020

Vocabulary Adaptation for Distant Domain Adaptation in Neural Machine Translation

Neural machine translation (NMT) models do not work well in domains diff...
research
06/02/2019

Domain Adaptation of Neural Machine Translation by Lexicon Induction

It has been previously noted that neural machine translation (NMT) is ve...
research
10/23/2020

Rapid Domain Adaptation for Machine Translation with Monolingual Data

One challenge of machine translation is how to quickly adapt to unseen d...
research
05/31/2023

Towards Flow Graph Prediction of Open-Domain Procedural Texts

Machine comprehension of procedural texts is essential for reasoning abo...
research
09/28/2019

The Source-Target Domain Mismatch Problem in Machine Translation

While we live in an increasingly interconnected world, different places ...
research
10/18/2022

Domain Specific Sub-network for Multi-Domain Neural Machine Translation

This paper presents Domain-Specific Sub-network (DoSS). It uses a set of...
research
08/20/2022

General-to-Specific Transfer Labeling for Domain Adaptable Keyphrase Generation

Training keyphrase generation (KPG) models requires a large amount of an...

Please sign up or login with your details

Forgot password? Click here to reset