Transparent, Efficient, and Robust Word Embedding Access with WOMBAT

07/02/2018
by   Mark-Christoph Müller, et al.
0

We present WOMBAT, a Python tool which supports NLP practitioners in accessing word embeddings from code. WOMBAT addresses common research problems, including unified access, scaling, and robust and reproducible preprocessing. Code that uses WOMBAT for accessing word embeddings is not only cleaner, more readable, and easier to reuse, but also much more efficient than code using standard in-memory methods: a Python script using WOMBAT for evaluating seven large word embedding collections (8.7M embedding vectors in total) on a simple SemEval sentence similarity task involving 250 raw sentence pairs completes in under ten seconds end-to-end on a standard notebook computer.

READ FULL TEXT
research
11/12/2020

Deconstructing word embedding algorithms

Word embeddings are reliable feature representations of words used to ob...
research
06/10/2016

PSDVec: a Toolbox for Incremental and Scalable Word Embedding

PSDVec is a Python/Perl toolbox that learns word embeddings, i.e. the ma...
research
11/29/2019

Deconstructing and reconstructing word embedding algorithms

Uncontextualized word embeddings are reliable feature representations of...
research
03/07/2018

The emergent algebraic structure of RNNs and embeddings in NLP

We examine the algebraic and geometric properties of a uni-directional G...
research
04/29/2019

Semantic Matching of Documents from Heterogeneous Collections: A Simple and Transparent Method for Practical Applications

We present a very simple, unsupervised method for the pairwise matching ...
research
09/12/2018

Graph Convolutional Networks based Word Embeddings

Recently, word embeddings have been widely adopted across several NLP ap...
research
06/07/2017

Insights into Analogy Completion from the Biomedical Domain

Analogy completion has been a popular task in recent years for evaluatin...

Please sign up or login with your details

Forgot password? Click here to reset