is an embedding of the word when is a center word,
is an embedding of the word when is a context word.
We follow the assumptions of assylbekov2019context assylbekov2019context on the nature of word vectors, context vectors, and text generation, i.e.
Hypothesis. Under the assumptions 1–3 above, assylbekov2019context assylbekov2019context showed that each word’s vector splits into two approximately equally-sized subvectors and , and the model (1) for generating a word in the context of a word can be rewritten as
Interestingly, embeddings of the first type ( and ) are responsible for pulling the word into the context of the word , while embeddings of the second type ( and ) are responsible for pushing the word away from the context of the word . We hypothesize that the -embeddings are more related to semantics, whereas the -embeddings are more related to syntax. In what follows we provide a motivating example for this hypothesis and then empirically validate it through controlled experiments.
|Data||Embeddings||Size||Finkelstein et al.||Bruni et al.||Radinsky et al.||Luong, Socher, and Manning||MSR|
|WordSim||MEN||M. Turk||Rare Words|
Evaluation of word vectors and subvectors on the analogy tasks (Google and MSR) and on the similarity tasks (the rest). For word similarities evaluation metric is the Spearman’s correlation with the human ratings, while for word analogies it is the percentage of correct answers. Model sizes are in number of trainable parameters.
Consider a phrase
|the dog barking at strangers|
The word ‘barking’ appears in the context of the word ‘dog’ but the word vector is not the closest to the word vector (see Table 2). Instead, these vectors are split
in such way that the quantity is large enough. We can interpret this as follows: the word ‘barking’ is semantically close enough to the word ‘dog’ but is not the closest one: e.g. is much closer to than ; on the other hand the word ‘barking’ syntactically fits better being next to the word ‘dog’ than ‘puppy’, i.e. .
This combination of semantic proximity () and syntactic fit () allows the word ‘barking’ to appear in the context of the word ‘dog’.
In this section we empirically verify our hypothesis. We train SGNS with tied weights  on two widely-used datasets, text8 and enwik9,111http://mattmahoney.net/dc/textdata.html. The enwik9 data was processed with the Perl-script wikifil.pl provided on the same webpage. which gives us word embeddings as well as their partitions:
The source code that reproduces our experiments is available at https://github.com/MaxatTezekbayev/Semantics–and-Syntax-related-Subvectors-in-the-Skip-gram-Embeddings.
-Subvectors Are Related to Semantics
We evaluate the whole vectors ’s, as well as the subvectors ’s and ’s on standard semantic tasks — word similarity and word analogy. We used the hyperwords tool of levy2015improving levy2015improving and we refer the reader to their paper for the methodology of evaluation. The results of evaluation are provided in Table 1. As one can see, the -subvectors outperform the whole -vectors in the similarity tasks and show competitive performance in the analogy tasks. However, the -parts demonstrate poor performance in these tasks. This shows that the -subvectors carry more semantic information than the -subvectors.
-Subvectors Are Related to Syntax
We train a softmax regression by feeding in the embedding of a current word to predict the part-of-speech (POS) tag of the next word:
We evaluate the whole vectors and the subvectors on tagging the Brown corpus with the Universal POS tags. The resulting accuracies are provided in Table 3.
|Embeddings||Size||Trained on||Trained on|
We can see that the -subvectors are more suitable for POS-tagging than the -subvectors, which means than the -parts carry more syntactic information than the -parts.
Theoretical analysis of word embeddings gives us better understanding of their properties. Moreover, theory may provide us interesting hypotheses on the nature and structure of word embeddings, and such hypotheses can be verified empirically as is done in this paper.
This work is supported by the Nazarbayev University Collaborative Research Program 091019CRP2109, and by the Committee of Science of the Ministry of Education and Science of the Republic of Kazakhstan, IRN AP05133700.
Context vectors are reflections of word vectors in half the dimensions.
Journal of Artificial Intelligence Research66, pp. 225–242. Cited by: Experiments.
-  (2013) Distributed representations of words and phrases and their compositionality. In Proceedings of NeurIPS, pp. 3111–3119. Cited by: Introduction.