Distributed Word2Vec using Graph Analytics Frameworks

09/08/2019
by   Gurbinder Gill, et al.
0

Word embeddings capture semantic and syntactic similarities of words, represented as vectors. Word2Vec is a popular implementation of word embeddings; it takes as input a large corpus of text and learns a model that maps unique words in that corpus to other contextually relevant words. After training, Word2Vec's internal vector representation of words in the corpus map unique words to a vector space, which are then used in many downstream tasks. Training these models requires significant computational resources (training time often measured in days) and is difficult to parallelize. Most word embedding training uses stochastic gradient descent (SGD), an "inherently" sequential algorithm where at each step, the processing of the current example depends on the parameters learned from the previous examples. Prior approaches to parallelizing SGD do not honor these dependencies and thus potentially suffer poor convergence. This paper introduces GraphWord2Vec, a distributedWord2Vec algorithm which formulates the Word2Vec training process as a distributed graph problem and thus leverage state-of-the-art distributed graph analytics frameworks such as D-Galois and Gemini that scale to large distributed clusters. GraphWord2Vec also demonstrates how to use model combiners to honor data dependencies in SGD and thus scale without giving up convergence. We will show that GraphWord2Vec has linear scalability up to 32 machines converging as fast as a sequential run in terms of epochs, thus reducing training time by 14x.

READ FULL TEXT

page 9

page 10

research
05/22/2017

Parallel Stochastic Gradient Descent with Sound Combiners

Stochastic gradient descent (SGD) is a well known method for regression ...
research
11/12/2019

word2ket: Space-efficient Word Embeddings inspired by Quantum Entanglement

Deep learning natural language processing models often use vector word e...
research
06/04/2020

Scaling Distributed Training with Adaptive Summation

Stochastic gradient descent (SGD) is an inherently sequential training a...
research
04/24/2015

Efficient Non-parametric Estimation of Multiple Embeddings per Word in Vector Space

There is rising interest in vector-space word embeddings and their use i...
research
07/01/2019

Few-Shot Representation Learning for Out-Of-Vocabulary Words

Existing approaches for learning word embeddings often assume there are ...
research
12/07/2018

Asynchronous Training of Word Embeddings for Large Text Corpora

Word embeddings are a powerful approach for analyzing language and have ...
research
03/08/2018

Improving Optimization in Models With Continuous Symmetry Breaking

Many loss functions in representation learning are invariant under a con...

Please sign up or login with your details

Forgot password? Click here to reset