Wikipedia2Vec: An Optimized Tool for Learning Embeddings of Words and Entities from Wikipedia

12/15/2018
by   Ikuya Yamada, et al.
0

We present Wikipedia2Vec, an open source tool for learning embeddings of words and entities from Wikipedia. This tool enables users to easily obtain high-quality embeddings of words and entities from a Wikipedia dump with a single command. The learned embeddings can be used as features in downstream natural language processing (NLP) models. The tool can be installed via PyPI. The source code, documentation, and pretrained embeddings for 12 major languages can be obtained at http://wikipedia2vec.github.io.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/15/2018

Wikipedia2Vec: An Optimized Implementation for Learning Embeddings from Wikipedia

We present Wikipedia2Vec, an open source tool for learning embeddings of...
research
11/06/2020

Corpora Compared: The Case of the Swedish Gigaword Wikipedia Corpora

In this work, we show that the difference in performance of embeddings f...
research
07/06/2021

OptiMic: A tool to generate optimized polycrystalline microstructures for materials simulations

Polycrystal microstructures, with their distinct physical, chemical, str...
research
05/07/2022

Odor Descriptor Understanding through Prompting

Embeddings from contemporary natural language processing (NLP) models ar...
research
10/13/2022

Early Discovery of Disappearing Entities in Microblogs

We make decisions by reacting to changes in the real world, in particula...
research
06/10/2020

PeopleMap: Visualization Tool for Mapping Out Researchers using Natural Language Processing

Discovering research expertise at institutions can be a difficult task. ...
research
10/26/2018

Magnitude: A Fast, Efficient Universal Vector Embedding Utility Package

Vector space embedding models like word2vec, GloVe, fastText, and ELMo a...

Please sign up or login with your details

Forgot password? Click here to reset