Principal Word Vectors

07/09/2020
by   Ali Basirat, et al.
0

We generalize principal component analysis for embedding words into a vector space. The generalization is made in two major levels. The first is to generalize the concept of the corpus as a counting process which is defined by three key elements vocabulary set, feature (annotation) set, and context. This generalization enables the principal word embedding method to generate word vectors with regard to different types of contexts and different types of annotations provided for a corpus. The second is to generalize the transformation step used in most of the word embedding methods. To this end, we define two levels of transformations. The first is a quadratic transformation, which accounts for different types of weighting over the vocabulary units and contextual features. Second is an adaptive non-linear transformation, which reshapes the data distribution to be meaningful to principal component analysis. The effect of these generalizations on the word vectors is intrinsically studied with regard to the spread and the discriminability of the word vectors. We also provide an extrinsic evaluation of the contribution of the principal word vectors on a word similarity benchmark and the task of dependency parsing. Our experiments are finalized by a comparison between the principal word vectors and other sets of word vectors generated with popular word embedding methods. The results obtained from our intrinsic evaluation metrics show that the spread and the discriminability of the principal word vectors are higher than that of other word embedding methods. The results obtained from the extrinsic evaluation metrics show that the principal word vectors are better than some of the word embedding methods and on par with popular methods of word embedding.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/07/2021

A Comparison of Word2Vec, HMM2Vec, and PCA2Vec for Malware Classification

Word embeddings are often used in natural language processing as a means...
research
05/07/2023

An Investigation on Word Embedding Offset Clustering as Relationship Classification

Vector representations obtained from word embedding are the source of ma...
research
11/09/2021

A Computational Approach to Walt Whitman's Stylistic Changes in Leaves of Grass

This study analyzes Walt Whitman's stylistic changes in his phenomenal w...
research
10/18/2022

On the Information Content of Predictions in Word Analogy Tests

An approach is proposed to quantify, in bits of information, the actual ...
research
02/24/2021

Trajectory-Based Meta-Learning for Out-Of-Vocabulary Word Embedding Learning

Word embedding learning methods require a large number of occurrences of...
research
09/18/2020

Principal Components of the Meaning

In this paper we argue that (lexical) meaning in science can be represen...
research
04/17/2021

Are Word Embedding Methods Stable and Should We Care About It?

A representation learning method is considered stable if it consistently...

Please sign up or login with your details

Forgot password? Click here to reset