Beyond Bilingual: Multi-sense Word Embeddings using Multilingual Context

06/25/2017
by   Shyam Upadhyay, et al.
0

Word embeddings, which represent a word as a point in a vector space, have become ubiquitous to several NLP tasks. A recent line of work uses bilingual (two languages) corpora to learn a different vector for each sense of a word, by exploiting crosslingual signals to aid sense identification. We present a multi-view Bayesian non-parametric algorithm which improves multi-sense word embeddings by (a) using multilingual (i.e., more than two languages) corpora to significantly improve sense embeddings beyond what one achieves with bilingual information, and (b) uses a principled approach to learn a variable number of senses per word, in a data-driven manner. Ours is the first approach with the ability to leverage multilingual corpora efficiently for multi-sense representation learning. Experiments show that multilingual training significantly improves performance over monolingual and bilingual training, by allowing us to combine different parallel corpora to leverage multilingual context. Multilingual training yields comparable performance to a state of the art mono-lingual model trained on five times more training data.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/21/2021

Debiasing Multilingual Word Embeddings: A Case Study of Three Indian Languages

In this paper, we advance the current state-of-the-art method for debias...
research
06/30/2016

Learning Crosslingual Word Embeddings without Bilingual Corpora

Crosslingual word embeddings represent lexical items from different lang...
research
03/30/2016

Bilingual Learning of Multi-sense Embeddings with Discrete Autoencoders

We present an approach to learning multi-sense word embeddings relying b...
research
05/14/2019

Multilingual Factor Analysis

In this work we approach the task of learning multilingual word represen...
research
12/05/2019

Massive vs. Curated Word Embeddings for Low-Resourced Languages. The Case of Yorùbá and Twi

The success of several architectures to learn semantic representations f...
research
11/27/2016

Semi Supervised Preposition-Sense Disambiguation using Multilingual Data

Prepositions are very common and very ambiguous, and understanding their...

Please sign up or login with your details

Forgot password? Click here to reset