Using k-way Co-occurrences for Learning Word Embeddings

09/05/2017
by   Danushka Bollegala, et al.
0

Co-occurrences between two words provide useful insights into the semantics of those words. Consequently, numerous prior work on word embedding learning have used co-occurrences between two words as the training signal for learning word embeddings. However, in natural language texts it is common for multiple words to be related and co-occurring in the same context. We extend the notion of co-occurrences to cover k(≥2)-way co-occurrences among a set of k-words. Specifically, we prove a theoretical relationship between the joint probability of k(≥2) words, and the sum of ℓ_2 norms of their embeddings. Next, we propose a learning objective motivated by our theoretical result that utilises k-way co-occurrences for learning word embeddings. Our experimental results show that the derived theoretical relationship does indeed hold empirically, and despite data sparsity, for some smaller k values, k-way embeddings perform comparably or better than 2-way embeddings in a range of tasks.

READ FULL TEXT
11/12/2020

Deconstructing word embedding algorithms

Word embeddings are reliable feature representations of words used to ob...
08/28/2018

WiC: 10,000 Example Pairs for Evaluating Context-Sensitive Representations

By design, word embeddings are unable to model the dynamic nature of wor...
11/25/2019

Towards robust word embeddings for noisy texts

Research on word embeddings has mainly focused on improving their perfor...
02/26/2019

Context Vectors are Reflections of Word Vectors in Half the Dimensions

This paper takes a step towards theoretical analysis of the relationship...
09/03/2015

Encoding Prior Knowledge with Eigenword Embeddings

Canonical correlation analysis (CCA) is a method for reducing the dimens...
05/13/2022

IRB-NLP at SemEval-2022 Task 1: Exploring the Relationship Between Words and Their Semantic Representations

What is the relation between a word and its description, or a word and i...
01/28/2019

Analogies Explained: Towards Understanding Word Embeddings

Word embeddings generated by neural network methods such as word2vec (W2...