Learning the Dimensionality of Word Embeddings

11/17/2015
by   Eric Nalisnick, et al.
0

We describe a method for learning word embeddings with data-dependent dimensionality. Our Stochastic Dimensionality Skip-Gram (SD-SG) and Stochastic Dimensionality Continuous Bag-of-Words (SD-CBOW) are nonparametric analogs of Mikolov et al.'s (2013) well-known 'word2vec' models. Vector dimensionality is made dynamic by employing techniques used by Cote & Larochelle (2016) to define an RBM with an infinite number of hidden units. We show qualitatively and quantitatively that SD-SG and SD-CBOW are competitive with their fixed-dimension counterparts while providing a distribution over embedding dimensionalities, which offers a window into how semantics distribute across dimensions.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/27/2017

Analysis of Italian Word Embeddings

In this work we analyze the performances of two of the most used word em...
research
01/05/2016

The Role of Context Types and Dimensionality in Learning Word Embeddings

We provide the first extensive evaluation of how using different types o...
research
05/29/2020

InfiniteWalk: Deep Network Embeddings as Laplacian Embeddings with a Nonlinearity

The skip-gram model for learning word embeddings (Mikolov et al. 2013) h...
research
09/04/2019

Empirical Study of Diachronic Word Embeddings for Scarce Data

Word meaning change can be inferred from drifts of time-varying word emb...
research
01/07/2019

On the Dimensionality of Embeddings for Sparse Features and Data

In this note we discuss a common misconception, namely that embeddings a...
research
12/30/2020

kōan: A Corrected CBOW Implementation

It is a common belief in the NLP community that continuous bag-of-words ...
research
05/15/2018

Unsupervised Learning of Style-sensitive Word Vectors

This paper presents the first study aimed at capturing stylistic similar...

Please sign up or login with your details

Forgot password? Click here to reset