Improving Unsupervised Word-by-Word Translation with Language Model and Denoising Autoencoder

01/06/2019
by   Yunsu Kim, et al.
0

Unsupervised learning of cross-lingual word embedding offers elegant matching of words across languages, but has fundamental limitations in translating sentences. In this paper, we propose simple yet effective methods to improve word-by-word translation of cross-lingual embeddings, using only monolingual corpora but without any back-translation. We integrate a language model for context-aware search, and use a novel denoising autoencoder to handle reordering. Our system surpasses state-of-the-art unsupervised neural translation systems without costly iterative training. We also analyze the effect of vocabulary size and denoising type on the translation performance, which provides better understanding of learning the cross-lingual word embedding and its usage in translation.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/10/2018

Unsupervised Cross-lingual Transfer of Word Embedding Spaces

Cross-lingual transfer of word embeddings aims to establish the semantic...
research
07/24/2019

Bilingual Lexicon Induction through Unsupervised Machine Translation

A recent research line has obtained strong results on bilingual lexicon ...
research
02/08/2021

SLUA: A Super Lightweight Unsupervised Word Alignment Model via Cross-Lingual Contrastive Learning

Word alignment is essential for the down-streaming cross-lingual languag...
research
01/18/2018

An Iterative Closest Point Method for Unsupervised Word Translation

Unsupervised word translation from non-parallel inter-lingual corpora ha...
research
10/11/2022

IsoVec: Controlling the Relative Isomorphism of Word Embedding Spaces

The ability to extract high-quality translation dictionaries from monoli...
research
11/16/2017

Addressing Cross-Lingual Word Sense Disambiguation on Low-Density Languages: Application to Persian

We explore the use of unsupervised methods in Cross-Lingual Word Sense D...
research
08/31/2018

Gromov-Wasserstein Alignment of Word Embedding Spaces

Cross-lingual or cross-domain correspondences play key roles in tasks ra...

Please sign up or login with your details

Forgot password? Click here to reset