Efficient Non-parametric Estimation of Multiple Embeddings per Word in Vector Space

04/24/2015
by   Arvind Neelakantan, et al.
0

There is rising interest in vector-space word embeddings and their use in NLP, especially given recent methods for their fast estimation at very large scale. Nearly all this work, however, assumes a single vector per word type ignoring polysemy and thus jeopardizing their usefulness for downstream tasks. We present an extension to the Skip-gram model that efficiently learns multiple embeddings per word type. It differs from recent related work by jointly performing word sense discrimination and embedding learning, by non-parametrically estimating the number of senses per word type, and by its efficiency and scalability. We present new state-of-the-art results in the word similarity in context task and demonstrate its scalability by training with one machine on a corpus of nearly 1 billion tokens in less than 6 hours.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/04/2015

AutoExtend: Extending Word Embeddings to Embeddings for Synsets and Lexemes

We present AutoExtend, a system to learn embeddings for synsets and lexe...
research
03/07/2020

Discovering linguistic (ir)regularities in word embeddings through max-margin separating hyperplanes

We experiment with new methods for learning how related words are positi...
research
08/16/2021

IsoScore: Measuring the Uniformity of Vector Space Utilization

The recent success of distributed word representations has led to an inc...
research
05/11/2020

Multidirectional Associative Optimization of Function-Specific Word Representations

We present a neural framework for learning associations between interrel...
research
05/01/2016

Text-mining the NeuroSynth corpus using Deep Boltzmann Machines

Large-scale automated meta-analysis of neuroimaging data has recently es...
research
03/22/2017

Supervised Typing of Big Graphs using Semantic Embeddings

We propose a supervised algorithm for generating type embeddings in the ...
research
09/08/2019

Distributed Word2Vec using Graph Analytics Frameworks

Word embeddings capture semantic and syntactic similarities of words, re...

Please sign up or login with your details

Forgot password? Click here to reset