Improving Cross-Lingual Word Embeddings by Meeting in the Middle

08/27/2018
by   Yerai Doval, et al.
0

Cross-lingual word embeddings are becoming increasingly important in multilingual NLP. Recently, it has been shown that these embeddings can be effectively learned by aligning two disjoint monolingual vector spaces through linear transformations, using no more than a small bilingual dictionary as supervision. In this work, we propose to apply an additional transformation after the initial alignment step, which moves cross-lingual synonyms towards a middle point between them. By applying this transformation our aim is to obtain a better cross-lingual integration of the vector spaces. In addition, and perhaps surprisingly, the monolingual spaces also improve by this transformation. This is in contrast to the original alignment, which is typically learned such that the structure of the monolingual spaces is preserved. Our experiments confirm that the resulting cross-lingual embeddings outperform state-of-the-art models in both monolingual and cross-lingual evaluation tasks.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/16/2019

Meemi: Finding the Middle Ground in Cross-lingual Word Embeddings

Word embeddings have become a standard resource in the toolset of any Na...
research
11/28/2014

Coarse-grained Cross-lingual Alignment of Comparable Texts with Topic Models and Encyclopedic Knowledge

We present a method for coarse-grained cross-lingual alignment of compar...
research
11/01/2018

Learning Unsupervised Word Mapping by Maximizing Mean Discrepancy

Cross-lingual word embeddings aim to capture common linguistic regularit...
research
10/10/2019

Cross-lingual Alignment vs Joint Training: A Comparative Study and A Simple Unified Framework

Learning multilingual representations of text has proven a successful me...
research
09/11/2021

The Impact of Positional Encodings on Multilingual Compression

In order to preserve word-order information in a non-autoregressive sett...
research
05/17/2019

Learning Cross-lingual Embeddings from Twitter via Distant Supervision

Cross-lingual embeddings represent the meaning of words from different l...
research
06/23/2019

Cross-lingual Data Transformation and Combination for Text Classification

Text classification is a fundamental task for text data mining. In order...

Please sign up or login with your details

Forgot password? Click here to reset