Model Choices Influence Attributive Word Associations: A Semi-supervised Analysis of Static Word Embeddings

12/14/2020
by   Geetanjali Bihani, et al.
0

Static word embeddings encode word associations, extensively utilized in downstream NLP tasks. Although prior studies have discussed the nature of such word associations in terms of biases and lexical regularities captured, the variation in word associations based on the embedding training procedure remains in obscurity. This work aims to address this gap by assessing attributive word associations across five different static word embedding architectures, analyzing the impact of the choice of the model architecture, context learning flavor and training corpora. Our approach utilizes a semi-supervised clustering method to cluster annotated proper nouns and adjectives, based on their word embedding features, revealing underlying attributive word associations formed in the embedding space, without introducing any confirmation bias. Our results reveal that the choice of the context learning flavor during embedding training (CBOW vs skip-gram) impacts the word association distinguishability and word embeddings' sensitivity to deviations in the training corpora. Moreover, it is empirically shown that even when trained over the same corpora, there is significant inter-model disparity and intra-model similarity in the encoded word associations across different word embedding models, portraying specific patterns in the way the embedding space is created for each embedding architecture.

READ FULL TEXT
research
11/15/2022

The Dependence on Frequency of Word Embedding Similarity Measures

Recent research has shown that static word embeddings can encode word fr...
research
10/06/2021

Human-in-the-Loop Refinement of Word Embeddings

Word embeddings are a fixed, distributional representation of the contex...
research
05/13/2023

Frequency-aware Dimension Selection for Static Word Embedding by Mixed Product Distance

Static word embedding is still useful, particularly for context-unavaila...
research
10/25/2020

Contextualized Word Embeddings Encode Aspects of Human-Like Word Sense Knowledge

Understanding context-dependent variation in word meanings is a key aspe...
research
05/19/2020

Word-Emoji Embeddings from large scale Messaging Data reflect real-world Semantic Associations of Expressive Icons

We train word-emoji embeddings on large scale messaging data obtained fr...
research
02/27/2017

Dynamic Word Embeddings

We present a probabilistic language model for time-stamped text data whi...
research
12/19/2022

Norm of word embedding encodes information gain

Distributed representations of words encode lexical semantic information...

Please sign up or login with your details

Forgot password? Click here to reset