Norm of word embedding encodes information gain

12/19/2022
by   Momose Oyama, et al.
0

Distributed representations of words encode lexical semantic information, but how is that information encoded in word embeddings? Focusing on the skip-gram with negative-sampling method, we show theoretically and experimentally that the squared norm of word embedding encodes the information gain defined by the Kullback-Leibler divergence of the co-occurrence distribution of a word to the unigram distribution of the corpus. Furthermore, through experiments on tasks of keyword extraction, hypernym prediction, and part-of-speech discrimination, we confirmed that the KL divergence and the squared norm of embedding work as a measure of the informativeness of a word provided that the bias caused by word frequency is adequately corrected.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/08/2018

Exploration on Grounded Word Embedding: Matching Words and Images with Image-Enhanced Skip-Gram Model

Word embedding is designed to represent the semantic meaning of a word w...
research
11/29/2017

Embedding Words as Distributions with a Bayesian Skip-gram Model

We introduce a method for embedding words as probability densities in a ...
research
01/02/2023

The Undesirable Dependence on Frequency of Gender Bias Metrics Based on Word Embeddings

Numerous works use word embedding-based metrics to quantify societal bia...
research
12/14/2020

Model Choices Influence Attributive Word Associations: A Semi-supervised Analysis of Static Word Embeddings

Static word embeddings encode word associations, extensively utilized in...
research
01/10/2017

Implicitly Incorporating Morphological Information into Word Embedding

In this paper, we propose three novel models to enhance word embedding b...
research
09/04/2018

Segmentation-free compositional n-gram embedding

Applying conventional word embedding models to unsegmented languages, wh...
research
04/24/2017

Streaming Word Embeddings with the Space-Saving Algorithm

We develop a streaming (one-pass, bounded-memory) word embedding algorit...

Please sign up or login with your details

Forgot password? Click here to reset