Breaking Sticks and Ambiguities with Adaptive Skip-gram

02/25/2015
by   Sergey Bartunov, et al.
0

Recently proposed Skip-gram model is a powerful method for learning high-dimensional word representations that capture rich semantic relationships between words. However, Skip-gram as well as most prior work on learning word representations does not take into account word ambiguity and maintain only single representation per word. Although a number of Skip-gram modifications were proposed to overcome this limitation and learn multi-prototype word representations, they either require a known number of word meanings or learn them using greedy heuristic approaches. In this paper we propose the Adaptive Skip-gram model which is a nonparametric Bayesian extension of Skip-gram capable to automatically learn the required number of representations for all words at desired semantic resolution. We derive efficient online variational learning algorithm for the model and empirically demonstrate its efficiency on word-sense induction task.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/18/2018

SubGram: Extending Skip-gram Word Representation with Substrings

Skip-gram (word2vec) is a recent method for creating vector representati...
research
06/14/2020

Vietnamese Word Segmentation with SVM: Ambiguity Reduction and Suffix Capture

In this paper, we approach Vietnamese word segmentation as a binary clas...
research
02/17/2021

Contextual Skipgram: Training Word Representation Using Context Information

The skip-gram (SG) model learns word representation by predicting the wo...
research
10/16/2013

Distributed Representations of Words and Phrases and their Compositionality

The recently introduced continuous Skip-gram model is an efficient metho...
research
04/01/2018

Revisiting Skip-Gram Negative Sampling Model with Regularization

We revisit skip-gram negative sampling (SGNS), a popular neural-network ...
research
10/05/2016

Comparative study of LSA vs Word2vec embeddings in small corpora: a case study in dreams database

Word embeddings have been extensively studied in large text datasets. Ho...
research
10/04/2019

DialectGram: Automatic Detection of Dialectal Variation at Multiple Geographic Resolutions

We propose DialectGram, a method to detect dialectical variation across ...

Please sign up or login with your details

Forgot password? Click here to reset