Offline bilingual word vectors, orthogonal transformations and the inverted softmax

by   Samuel L. Smith, et al.
University of Cambridge
babylon health

Usually bilingual word vectors are trained "online". Mikolov et al. showed they can also be found "offline", whereby two pre-trained embeddings are aligned with a linear transformation, using dictionaries compiled from expert knowledge. In this work, we prove that the linear transformation between two spaces should be orthogonal. This transformation can be obtained using the singular value decomposition. We introduce a novel "inverted softmax" for identifying translation pairs, with which we improve the precision @1 of Mikolov's original mapping from 34 composed of both common and rare English words into Italian. Orthogonal transformations are more robust to noise, enabling us to learn the transformation without expert bilingual signal by constructing a "pseudo-dictionary" from the identical character strings which appear in both languages, achieving 40 method to retrieve the true translations of English sentences from a corpus of 200k Italian sentences with a precision @1 of 68


page 1

page 2

page 3

page 4


Rotate King to get Queen: Word Relationships as Orthogonal Transformations in Embedding Space

A notable property of word embeddings is that word relationships can exi...

Context-Aware Crosslingual Mapping

Cross-lingual word vectors are typically obtained by fitting an orthogon...

Cross-lingual Word Analogies using Linear Transformations between Semantic Spaces

We generalize the word analogy task across languages, to provide a new i...

Are Girls Neko or Shōjo? Cross-Lingual Alignment of Non-Isomorphic Embeddings with Iterative Normalization

Cross-lingual word embeddings (CLWE) underlie many multilingual natural ...

Improving Supervised Bilingual Mapping of Word Embeddings

Continuous word representations, learned on different languages, can be ...

Bridging Neural Machine Translation and Bilingual Dictionaries

Neural Machine Translation (NMT) has become the new state-of-the-art in ...

Identification of Parallel Passages Across a Large Hebrew/Aramaic Corpus

We propose a method for efficiently finding all parallel passages in a l...

Please sign up or login with your details

Forgot password? Click here to reset