Bilingual Lexicon Induction through Unsupervised Machine Translation

07/24/2019
by   Mikel Artetxe, et al.
0

A recent research line has obtained strong results on bilingual lexicon induction by aligning independently trained word embeddings in two languages and using the resulting cross-lingual embeddings to induce word translation pairs through nearest neighbor or related retrieval methods. In this paper, we propose an alternative approach to this problem that builds on the recent work on unsupervised machine translation. This way, instead of directly inducing a bilingual lexicon from cross-lingual embeddings, we use them to build a phrase-table, combine it with a language model, and use the resulting machine translation system to generate a synthetic parallel corpus, from which we extract the bilingual lexicon using statistical word alignment techniques. As such, our method can work with any word embedding and cross-lingual mapping technique, and it does not require any additional resource besides the monolingual corpus used to train the embeddings. When evaluated on the exact same cross-lingual embeddings, our proposed method obtains an average improvement of 6 accuracy points over nearest neighbor and 4 points over CSLS retrieval, establishing a new state-of-the-art in the standard MUSE dataset.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/06/2019

Improving Unsupervised Word-by-Word Translation with Language Model and Denoising Autoencoder

Unsupervised learning of cross-lingual word embedding offers elegant mat...
research
06/20/2020

Learning aligned embeddings for semi-supervised word translation using Maximum Mean Discrepancy

Word translation is an integral part of language translation. In machine...
research
08/31/2020

Discovering Bilingual Lexicons in Polyglot Word Embeddings

Bilingual lexicons and phrase tables are critical resources for modern M...
research
07/29/2019

CUNI Systems for the Unsupervised News Translation Task in WMT 2019

In this paper we describe the CUNI translation system used for the unsup...
research
11/26/2020

Unsupervised Word Translation Pairing using Refinement based Point Set Registration

Cross-lingual alignment of word embeddings play an important role in kno...
research
05/09/2018

On the Limitations of Unsupervised Bilingual Dictionary Induction

Unsupervised machine translation---i.e., not assuming any cross-lingual ...
research
06/02/2021

Evaluating Word Embeddings with Categorical Modularity

We introduce categorical modularity, a novel low-resource intrinsic metr...

Please sign up or login with your details

Forgot password? Click here to reset