Bilingual Embeddings with Random Walks over Multilingual Wordnets

04/23/2018
by   J. Goikoetxea, et al.
0

Bilingual word embeddings represent words of two languages in the same space, and allow to transfer knowledge from one language to the other without machine translation. The main approach is to train monolingual embeddings first and then map them using bilingual dictionaries. In this work, we present a novel method to learn bilingual embeddings based on multilingual knowledge bases (KB) such as WordNet. Our method extracts bilingual information from multilingual wordnets via random walks and learns a joint embedding space in one go. We further reinforce cross-lingual equivalence adding bilingual con- straints in the loss function of the popular skipgram model. Our experiments involve twelve cross-lingual word similarity and relatedness datasets in six lan- guage pairs covering four languages, and show that: 1) random walks over mul- tilingual wordnets improve results over just using dictionaries; 2) multilingual wordnets on their own improve over text-based systems in similarity datasets; 3) the good results are consistent for large wordnets (e.g. English, Spanish), smaller wordnets (e.g. Basque) or loosely aligned wordnets (e.g. Italian); 4) the combination of wordnets and text yields the best results, above mapping-based approaches. Our method can be applied to richer KBs like DBpedia or Babel- Net, and can be easily extended to multilingual embeddings. All software and resources are open source.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/27/2018

Learning Multilingual Word Embeddings in a Latent Metric Space: A Geometric Approach

We propose a novel geometric approach for learning bilingual mappings gi...
research
07/26/2023

The flow of ideas in word embeddings

The flow of ideas has been extensively studied by physicists, psychologi...
research
04/11/2017

ConceptNet at SemEval-2017 Task 2: Extending Word Embeddings with Multilingual Relational Knowledge

This paper describes Luminoso's participation in SemEval 2017 Task 2, "M...
research
11/08/2019

Should All Cross-Lingual Embeddings Speak English?

Most of recent work in cross-lingual word embeddings is severely Angloce...
research
02/05/2016

Massively Multilingual Word Embeddings

We introduce new methods for estimating and evaluating embeddings of wor...
research
03/27/2019

Image search using multilingual texts: a cross-modal learning approach between image and text Maxime Portaz Qwant Research

Multilingual (or cross-lingual) embeddings represent several languages i...
research
04/10/2020

A Simple Approach to Learning Unsupervised Multilingual Embeddings

Recent progress on unsupervised learning of cross-lingual embeddings in ...

Please sign up or login with your details

Forgot password? Click here to reset