A Common Semantic Space for Monolingual and Cross-Lingual Meta-Embeddings

01/17/2020
by   Iker García, et al.
0

This paper presents a new technique for creating monolingual and cross-lingual meta-embeddings. Our method integrates multiple word embeddings created from complementary techniques, textual sources, knowledge bases and languages. Existing word vectors are projected to a common semantic space using linear transformations and averaging. With our method the resulting meta-embeddings maintain the dimensionality of the original embeddings without losing information while dealing with the out-of-vocabulary problem. An extensive empirical evaluation demonstrates the effectiveness of our technique with respect to previous work on various intrinsic and extrinsic multilingual evaluations, obtaining competitive results for Semantic Textual Similarity and state-of-the-art performance for word similarity and POS tagging (English and Spanish). The resulting cross-lingual meta-embeddings also exhibit excellent cross-lingual transfer learning capabilities. In other words, we can leverage pre-trained source embeddings from a resource-rich language in order to improve the word representations for under-resourced languages.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/11/2021

Training Cross-Lingual embeddings for Setswana and Sepedi

African languages still lag in the advances of Natural Language Processi...
research
10/16/2019

Meemi: Finding the Middle Ground in Cross-lingual Word Embeddings

Word embeddings have become a standard resource in the toolset of any Na...
research
04/21/2018

Multi-lingual Common Semantic Space Construction via Cluster-consistent Word Embedding

We construct a multilingual common semantic space based on distributiona...
research
07/11/2018

Cross-lingual Word Analogies using Linear Transformations between Semantic Spaces

We generalize the word analogy task across languages, to provide a new i...
research
06/30/2016

Learning Crosslingual Word Embeddings without Bilingual Corpora

Crosslingual word embeddings represent lexical items from different lang...
research
12/02/2020

A Computational Approach to Measuring the Semantic Divergence of Cognates

Meaning is the foundation stone of intercultural communication. Language...
research
09/21/2021

How Familiar Does That Sound? Cross-Lingual Representational Similarity Analysis of Acoustic Word Embeddings

How do neural networks "perceive" speech sounds from unknown languages? ...

Please sign up or login with your details

Forgot password? Click here to reset