Log In Sign Up

Model Choices Influence Attributive Word Associations: A Semi-supervised Analysis of Static Word Embeddings

by   Geetanjali Bihani, et al.

Static word embeddings encode word associations, extensively utilized in downstream NLP tasks. Although prior studies have discussed the nature of such word associations in terms of biases and lexical regularities captured, the variation in word associations based on the embedding training procedure remains in obscurity. This work aims to address this gap by assessing attributive word associations across five different static word embedding architectures, analyzing the impact of the choice of the model architecture, context learning flavor and training corpora. Our approach utilizes a semi-supervised clustering method to cluster annotated proper nouns and adjectives, based on their word embedding features, revealing underlying attributive word associations formed in the embedding space, without introducing any confirmation bias. Our results reveal that the choice of the context learning flavor during embedding training (CBOW vs skip-gram) impacts the word association distinguishability and word embeddings' sensitivity to deviations in the training corpora. Moreover, it is empirically shown that even when trained over the same corpora, there is significant inter-model disparity and intra-model similarity in the encoded word associations across different word embedding models, portraying specific patterns in the way the embedding space is created for each embedding architecture.


The Dependence on Frequency of Word Embedding Similarity Measures

Recent research has shown that static word embeddings can encode word fr...

Human-in-the-Loop Refinement of Word Embeddings

Word embeddings are a fixed, distributional representation of the contex...

The Undesirable Dependence on Frequency of Gender Bias Metrics Based on Word Embeddings

Numerous works use word embedding-based metrics to quantify societal bia...

Contextualized Word Embeddings Encode Aspects of Human-Like Word Sense Knowledge

Understanding context-dependent variation in word meanings is a key aspe...

Dynamic Word Embeddings

We present a probabilistic language model for time-stamped text data whi...

Norm of word embedding encodes information gain

Distributed representations of words encode lexical semantic information...