Embedding Words as Distributions with a Bayesian Skip-gram Model

11/29/2017
by   Arthur Bražinskas, et al.
0

We introduce a method for embedding words as probability densities in a low-dimensional space. Rather than assuming that a word embedding is fixed across the entire text collection, as in standard word embedding methods, in our Bayesian model we generate it from a word-specific prior density for each occurrence of a given word. Intuitively, for each word, the prior density encodes the distribution of its potential 'meanings'. These prior densities are conceptually similar to Gaussian embeddings. Interestingly, unlike the Gaussian embeddings, we can also obtain context-specific densities: they encode uncertainty about the sense of a word given its context and correspond to posterior distributions within our model. The context-dependent densities have many potential applications: for example, we show that they can be directly used in the lexical substitution task. We describe an effective estimation method based on the variational autoencoding framework. We also demonstrate that our embeddings achieve competitive results on standard benchmarks.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/20/2014

Word Representations via Gaussian Embedding

Current work in lexical distributed representations maps each word to a ...
research
02/16/2018

Deep Generative Model for Joint Alignment and Word Representation

This work exploits translation data as a source of semantically relevant...
research
12/19/2022

Norm of word embedding encodes information gain

Distributed representations of words encode lexical semantic information...
research
04/26/2018

Hierarchical Density Order Embeddings

By representing words with probability densities rather than point vecto...
research
09/01/2020

Document Similarity from Vector Space Densities

We propose a computationally light method for estimating similarities be...
research
03/30/2021

Representing ELMo embeddings as two-dimensional text online

We describe a new addition to the WebVectors toolkit which is used to se...
research
11/19/2015

Gaussian Mixture Embeddings for Multiple Word Prototypes

Recently, word representation has been increasingly focused on for its e...

Please sign up or login with your details

Forgot password? Click here to reset