Fast Extraction of Word Embedding from Q-contexts

09/15/2021
by   Junsheng Kong, et al.
19

The notion of word embedding plays a fundamental role in natural language processing (NLP). However, pre-training word embedding for very large-scale vocabulary is computationally challenging for most existing methods. In this work, we show that with merely a small fraction of contexts (Q-contexts)which are typical in the whole corpus (and their mutual information with words), one can construct high-quality word embedding with negligible errors. Mutual information between contexts and words can be encoded canonically as a sampling state, thus, Q-contexts can be fast constructed. Furthermore, we present an efficient and effective WEQ method, which is capable of extracting word embedding directly from these typical contexts. In practical scenarios, our algorithm runs 11∼13 times faster than well-established methods. By comparing with well-known methods such as matrix factorization, word2vec, GloVeand fasttext, we demonstrate that our method achieves comparable performance on a variety of downstream NLP tasks, and in the meanwhile maintains run-time and resource advantages over all these baselines.

READ FULL TEXT
research
07/23/2020

Musical Word Embedding: Bridging the Gap between Listening Contexts and Music

Word embedding pioneered by Mikolov et al. is a staple technique for wor...
research
10/26/2021

Task-Specific Dependency-based Word Embedding Methods

Two task-specific dependency-based word embedding methods are proposed f...
research
01/25/2020

An Analysis of Word2Vec for the Italian Language

Word representation is fundamental in NLP tasks, because it is precisely...
research
06/03/2016

Enhancing the LexVec Distributed Word Representation Model Using Positional Contexts and External Memory

In this paper we take a state-of-the-art model for distributed word repr...
research
07/05/2020

Improving Chinese Segmentation-free Word Embedding With Unsupervised Association Measure

Recent work on segmentation-free word embedding(sembei) developed a new ...
research
07/29/2016

A Novel Bilingual Word Embedding Method for Lexical Translation Using Bilingual Sense Clique

Most of the existing methods for bilingual word embedding only consider ...
research
11/06/2019

Word Embedding Algorithms as Generalized Low Rank Models and their Canonical Form

Word embedding algorithms produce very reliable feature representations ...

Please sign up or login with your details

Forgot password? Click here to reset