What the Vec? Towards Probabilistically Grounded Embeddings

05/30/2018
by   Carl Allen, et al.
2

Vector representation, or embedding, of words is commonly achieved with neural network methods, in particular word2vec (W2V). It has been shown that certain statistics of word co-occurrences are implicitly captured by properties of W2V vectors, but much remains unknown of them, e.g. any meaning of length, or more generally how it is that statistics can be reliably framed as vectors at all. By deriving a mathematical link between probabilities and vectors, we justify why W2V works and are able to create embeddings with probabilistically interpretable properties.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/02/2015

Class Vectors: Embedding representation of Document Classes

Distributed representations of words and paragraphs as semantic embeddin...
research
11/14/2017

Modeling Semantic Relatedness using Global Relation Vectors

Word embedding models such as GloVe rely on co-occurrence statistics fro...
research
12/04/2020

Modelling General Properties of Nouns by Selectively Averaging Contextualised Embeddings

While the success of pre-trained language models has largely eliminated ...
research
10/18/2019

Estimator Vectors: OOV Word Embeddings based on Subword and Context Clue Estimates

Semantic representations of words have been successfully extracted from ...
research
09/05/2018

Firearms and Tigers are Dangerous, Kitchen Knives and Zebras are Not: Testing whether Word Embeddings Can Tell

This paper presents an approach for investigating the nature of semantic...
research
02/26/2019

Context Vectors are Reflections of Word Vectors in Half the Dimensions

This paper takes a step towards theoretical analysis of the relationship...
research
12/19/2022

Multi hash embeddings in spaCy

The distributed representation of symbols is one of the key technologies...

Please sign up or login with your details

Forgot password? Click here to reset