Beyond Offline Mapping: Learning Cross Lingual Word Embeddings through Context Anchoring

12/31/2020
by   Aitor Ormazabal, et al.
0

Recent research on cross-lingual word embeddings has been dominated by unsupervised mapping approaches that align monolingual embeddings. Such methods critically rely on those embeddings having a similar structure, but it was recently shown that the separate training in different languages causes departures from this assumption. In this paper, we propose an alternative approach that does not have this limitation, while requiring a weak seed dictionary (e.g., a list of identical words) as the only form of supervision. Rather than aligning two fixed embedding spaces, our method works by fixing the target language embeddings, and learning a new set of embeddings for the source language that are aligned with them. To that end, we use an extension of skip-gram that leverages translated context words as anchor points, and incorporates self-learning and iterative restarts to reduce the dependency on the initial dictionary. Our approach outperforms conventional mapping methods on bilingual lexicon induction, and obtains competitive results in the downstream XNLI task.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/12/2019

Analyzing the Limitations of Cross-lingual Word Embedding Mappings

Recent research in cross-lingual word embeddings has almost exclusively ...
research
06/30/2021

Cross-lingual alignments of ELMo contextual embeddings

Building machine learning prediction models for a specific NLP task requ...
research
04/22/2020

Revisiting the Context Window for Cross-lingual Word Embeddings

Existing approaches to mapping-based cross-lingual word embeddings are b...
research
12/19/2017

Unsupervised Word Mapping Using Structural Similarities in Monolingual Embeddings

Most existing methods of automatic bilingual dictionary induction rely o...
research
08/19/2019

Bilingual Lexicon Induction with Semi-supervision in Non-Isometric Embedding Spaces

Recent work on bilingual lexicon induction (BLI) has frequently depended...
research
10/16/2020

Multi-Adversarial Learning for Cross-Lingual Word Embeddings

Generative adversarial networks (GANs) have succeeded in inducing cross-...
research
11/01/2018

Learning Unsupervised Word Mapping by Maximizing Mean Discrepancy

Cross-lingual word embeddings aim to capture common linguistic regularit...

Please sign up or login with your details

Forgot password? Click here to reset