Offline bilingual word vectors, orthogonal transformations and the inverted softmax

02/13/2017
by   Samuel L. Smith, et al.
University of Cambridge
babylon health
0

Usually bilingual word vectors are trained "online". Mikolov et al. showed they can also be found "offline", whereby two pre-trained embeddings are aligned with a linear transformation, using dictionaries compiled from expert knowledge. In this work, we prove that the linear transformation between two spaces should be orthogonal. This transformation can be obtained using the singular value decomposition. We introduce a novel "inverted softmax" for identifying translation pairs, with which we improve the precision @1 of Mikolov's original mapping from 34 composed of both common and rare English words into Italian. Orthogonal transformations are more robust to noise, enabling us to learn the transformation without expert bilingual signal by constructing a "pseudo-dictionary" from the identical character strings which appear in both languages, achieving 40 method to retrieve the true translations of English sentences from a corpus of 200k Italian sentences with a precision @1 of 68

READ FULL TEXT

page 1

page 2

page 3

page 4

09/02/2019

Rotate King to get Queen: Word Relationships as Orthogonal Transformations in Embedding Space

A notable property of word embeddings is that word relationships can exi...
03/08/2019

Context-Aware Crosslingual Mapping

Cross-lingual word vectors are typically obtained by fitting an orthogon...
07/11/2018

Cross-lingual Word Analogies using Linear Transformations between Semantic Spaces

We generalize the word analogy task across languages, to provide a new i...
06/04/2019

Are Girls Neko or Shōjo? Cross-Lingual Alignment of Non-Isomorphic Embeddings with Iterative Normalization

Cross-lingual word embeddings (CLWE) underlie many multilingual natural ...
04/20/2018

Improving Supervised Bilingual Mapping of Word Embeddings

Continuous word representations, learned on different languages, can be ...
10/24/2016

Bridging Neural Machine Translation and Bilingual Dictionaries

Neural Machine Translation (NMT) has become the new state-of-the-art in ...
02/28/2016

Identification of Parallel Passages Across a Large Hebrew/Aramaic Corpus

We propose a method for efficiently finding all parallel passages in a l...

Please sign up or login with your details

Forgot password? Click here to reset