Resolving Out-of-Vocabulary Words with Bilingual Embeddings in Machine Translation

Out-of-vocabulary words account for a large proportion of errors in machine translation systems, especially when the system is used on a different domain than the one where it was trained. In order to alleviate the problem, we propose to use a log-bilinear softmax-based model for vocabulary expansion, such that given an out-of-vocabulary source word, the model generates a probabilistic list of possible translations in the target language. Our model uses only word embeddings trained on significantly large unlabelled monolingual corpora and trains over a fairly small, word-to-word bilingual dictionary. We input this probabilistic list into a standard phrase-based statistical machine translation system and obtain consistent improvements in translation quality on the English-Spanish language pair. Especially, we get an improvement of 3.9 BLEU points when tested over an out-of-domain test set.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/10/2016

Vocabulary Manipulation for Neural Machine Translation

In order to capture rich language phenomena, neural machine translation ...
research
02/05/2015

Beyond Word-based Language Model in Statistical Machine Translation

Language model is one of the most important modules in statistical machi...
research
12/10/2018

Von Mises-Fisher Loss for Training Sequence to Sequence Models with Continuous Outputs

The Softmax function is used in the final layer of nearly all existing s...
research
04/04/2019

ReWE: Regressing Word Embeddings for Regularization of Neural Machine Translation Systems

Regularization of neural machine translation is still a significant prob...
research
12/19/2014

Embedding Word Similarity with Neural Machine Translation

Neural language models learn word representations, or embeddings, that c...
research
10/29/2019

BPE-Dropout: Simple and Effective Subword Regularization

Subword segmentation is widely used to address the open vocabulary probl...
research
05/09/2017

Word and Phrase Translation with word2vec

Word and phrase tables are key inputs to machine translations, but costl...

Please sign up or login with your details

Forgot password? Click here to reset