Incorporating Word and Subword Units in Unsupervised Machine Translation Using Language Model Rescoring

08/16/2019
by   Zihan Liu, et al.
0

This paper describes CAiRE's submission to the unsupervised machine translation track of the WMT'19 news shared task from German to Czech. We leverage a phrase-based statistical machine translation (PBSMT) model and a pre-trained language model to combine word-level neural machine translation (NMT) and subword-level NMT models without using any parallel data. We propose to solve the morphological richness problem of languages by training byte-pair encoding (BPE) embeddings for German and Czech separately, and they are aligned using MUSE (Conneau et al., 2018). To ensure the fluency and consistency of translations, a rescoring mechanism is proposed that reuses the pre-trained language model to select the translation candidates generated through beam search. Moreover, a series of pre-processing and post-processing approaches are applied to improve the quality of final translations.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/11/2020

SJTU-NICT's Supervised and Unsupervised Neural Machine Translation Systems for the WMT20 News Translation Task

In this paper, we introduced our joint team SJTU-NICT 's participation i...
research
12/02/2019

Merging External Bilingual Pairs into Neural Machine Translation

As neural machine translation (NMT) is not easily amenable to explicit c...
research
11/10/2019

Language Model-Driven Unsupervised Neural Machine Translation

Unsupervised neural machine translation(NMT) is associated with noise an...
research
02/28/2020

Do all Roads Lead to Rome? Understanding the Role of Initialization in Iterative Back-Translation

Back-translation provides a simple yet effective approach to exploit mon...
research
06/14/2018

Morphological and Language-Agnostic Word Segmentation for NMT

The state of the art of handling rich morphology in neural machine trans...
research
05/27/2023

Augmenting Large Language Model Translators via Translation Memories

Using translation memories (TMs) as prompts is a promising approach to i...
research
10/19/2022

Separating Grains from the Chaff: Using Data Filtering to Improve Multilingual Translation for Low-Resourced African Languages

We participated in the WMT 2022 Large-Scale Machine Translation Evaluati...

Please sign up or login with your details

Forgot password? Click here to reset