Bridging Neural Machine Translation and Bilingual Dictionaries

10/24/2016
by   Jiajun Zhang, et al.
0

Neural Machine Translation (NMT) has become the new state-of-the-art in several language pairs. However, it remains a challenging problem how to integrate NMT with a bilingual dictionary which mainly contains words rarely or never seen in the bilingual training data. In this paper, we propose two methods to bridge NMT and the bilingual dictionaries. The core idea behind is to design novel models that transform the bilingual dictionaries into adequate sentence pairs, so that NMT can distil latent bilingual mappings from the ample and repetitive phenomena. One method leverages a mixed word/character model and the other attempts at synthesizing parallel sentences guaranteeing massive occurrence of the translation lexicon. Extensive experiments demonstrate that the proposed methods can remarkably improve the translation quality, and most of the rare words in the test sentences can obtain correct translations if they are covered by the dictionary.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/12/2020

Look It Up: Bilingual and Monolingual Dictionaries Improve Neural Machine Translation

Despite advances in neural machine translation (NMT) quality, rare words...
research
10/13/2022

DICTDIS: Dictionary Constrained Disambiguation for Improved NMT

Domain-specific neural machine translation (NMT) systems (e.g., in educa...
research
05/25/2018

Phrase Table as Recommendation Memory for Neural Machine Translation

Neural Machine Translation (NMT) has drawn much attention due to its pro...
research
04/23/2020

Multiple Segmentations of Thai Sentences for Neural Machine Translation

Thai is a low-resource language, so it is often the case that data is no...
research
05/02/2018

KNPTC: Knowledge and Neural Machine Translation Powered Chinese Pinyin Typo Correction

Chinese pinyin input methods are very important for Chinese language pro...
research
05/12/2021

Improving Lexically Constrained Neural Machine Translation with Source-Conditioned Masked Span Prediction

Generating accurate terminology is a crucial component for the practical...
research
02/13/2017

Offline bilingual word vectors, orthogonal transformations and the inverted softmax

Usually bilingual word vectors are trained "online". Mikolov et al. show...

Please sign up or login with your details

Forgot password? Click here to reset