On the Limitations of Unsupervised Bilingual Dictionary Induction

05/09/2018
by   Anders Søgaard, et al.
2

Unsupervised machine translation---i.e., not assuming any cross-lingual supervision signal, whether a dictionary, translations, or comparable corpora---seems impossible, but nevertheless, Lample et al. (2018) recently proposed a fully unsupervised machine translation (MT) model. The model relies heavily on an adversarial, unsupervised alignment of word embedding spaces for bilingual dictionary induction (Conneau et al., 2018), which we examine here. Our results identify the limitations of current unsupervised MT: unsupervised bilingual dictionary induction performs much worse on morphologically rich languages that are not dependent marking, when monolingual corpora from different domains or different embedding algorithms are used. We show that a simple trick, exploiting a weak supervision signal from identical words, enables more robust induction, and establish a near-perfect correlation between unsupervised bilingual dictionary induction performance and a previously unexplored graph similarity metric.

READ FULL TEXT
research
07/06/2020

Bilingual Dictionary Based Neural Machine Translation without Using Parallel Sentences

In this paper, we propose a new task of machine translation (MT), which ...
research
09/04/2019

Do We Really Need Fully Unsupervised Cross-Lingual Embeddings?

Recent efforts in cross-lingual word embedding (CLWE) learning have pred...
research
07/24/2019

Bilingual Lexicon Induction through Unsupervised Machine Translation

A recent research line has obtained strong results on bilingual lexicon ...
research
04/12/2020

When Does Unsupervised Machine Translation Work?

Despite the reported success of unsupervised machine translation (MT), t...
research
02/04/2019

Unsupervised Clinical Language Translation

As patients' access to their doctors' clinical notes becomes common, tra...
research
08/25/2023

Media of Langue

This paper aims to archive the materials behind "Media of Langue" by Gok...
research
08/31/2018

Generalizing Procrustes Analysis for Better Bilingual Dictionary Induction

Most recent approaches to bilingual dictionary induction find a linear a...

Please sign up or login with your details

Forgot password? Click here to reset