IsoVec: Controlling the Relative Isomorphism of Word Embedding Spaces

10/11/2022
by   Kelly Marchisio, et al.
0

The ability to extract high-quality translation dictionaries from monolingual word embedding spaces depends critically on the geometric similarity of the spaces – their degree of "isomorphism." We address the root-cause of faulty cross-lingual mapping: that word embedding training resulted in the underlying spaces being non-isomorphic. We incorporate global measures of isomorphism directly into the skipgram loss function, successfully increasing the relative isomorphism of trained word embedding spaces and improving their ability to be mapped to a shared cross-lingual space. The result is improved bilingual lexicon induction in general data conditions, under domain mismatch, and with training algorithm dissimilarities. We release IsoVec at https://github.com/kellymarchisio/isovec.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/10/2018

Unsupervised Cross-lingual Transfer of Word Embedding Spaces

Cross-lingual transfer of word embeddings aims to establish the semantic...
research
01/06/2019

Improving Unsupervised Word-by-Word Translation with Language Model and Denoising Autoencoder

Unsupervised learning of cross-lingual word embedding offers elegant mat...
research
10/07/2022

Robust Unsupervised Cross-Lingual Word Embedding using Domain Flow Interpolation

This paper investigates an unsupervised approach towards deriving a univ...
research
04/04/2019

Density Matching for Bilingual Word Embedding

Recent approaches to cross-lingual word embedding have generally been ba...
research
06/07/2018

Characterizing Departures from Linearity in Word Translation

We investigate the behavior of maps learned by machine translation metho...
research
07/29/2021

The Cross-Lingual Arabic Information REtrieval (CLAIRE) System

Despite advances in neural machine translation, cross-lingual retrieval ...
research
04/08/2020

Are All Good Word Vector Spaces Isomorphic?

Existing algorithms for aligning cross-lingual word vector spaces assume...

Please sign up or login with your details

Forgot password? Click here to reset