Linear Algebraic Structure of Word Senses, with Applications to Polysemy

01/14/2016
by   Sanjeev Arora, et al.
0

Word embeddings are ubiquitous in NLP and information retrieval, but it's unclear what they represent when the word is polysemous, i.e., has multiple senses. Here it is shown that multiple word senses reside in linear superposition within the word embedding and can be recovered by simple sparse coding. The success of the method ---which applies to several embedding methods including word2vec--- is mathematically explained using the random walk on discourses model (Arora et al., 2016). A novel aspect of our technique is that each word sense is also accompanied by one of about 2000 discourse atoms that give a succinct description of which other words co-occur with that word sense. Discourse atoms seem of independent interest, and make the method potentially more useful than the traditional clustering-based approaches to polysemy.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/15/2016

Learning Word Sense Embeddings from Word Sense Definitions

Word embeddings play a significant role in many modern NLP systems. Sinc...
research
06/22/2016

Toward Word Embedding for Personalized Information Retrieval

This paper presents preliminary works on using Word Embedding (word2vec)...
research
01/11/2023

Word-Graph2vec: An efficient word embedding approach on word co-occurrence graph using random walk sampling

Word embedding has become ubiquitous and is widely used in various text ...
research
02/15/2014

word2vec Explained: deriving Mikolov et al.'s negative-sampling word-embedding method

The word2vec software of Tomas Mikolov and colleagues (https://code.goog...
research
03/07/2018

The emergent algebraic structure of RNNs and embeddings in NLP

We examine the algebraic and geometric properties of a uni-directional G...
research
04/14/2023

OPI at SemEval 2023 Task 1: Image-Text Embeddings and Multimodal Information Retrieval for Visual Word Sense Disambiguation

The goal of visual word sense disambiguation is to find the image that b...
research
12/01/2020

Spectral Analysis of Word Statistics

Given a random text over a finite alphabet, we study the frequencies at ...

Please sign up or login with your details

Forgot password? Click here to reset