How to Generate a Good Word Embedding?

07/20/2015
by   Siwei Lai, et al.
0

We analyze three critical components of word embedding training: the model, the corpus, and the training parameters. We systematize existing neural-network-based word embedding algorithms and compare them using the same corpus. We evaluate each word embedding in three ways: analyzing its semantic properties, using it as a feature for supervised tasks and using it to initialize neural networks. We also provide several simple guidelines for training word embeddings. First, we discover that corpus domain is more important than corpus size. We recommend choosing a corpus in a suitable domain for the desired task, after that, using a larger corpus yields better results. Second, we find that faster models provide sufficient performance in most cases, and more complex models can be used if the training corpus is sufficiently large. Third, the early stopping metric for iterating should rely on the development set of the desired task rather than the validation loss of training embedding.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/04/2019

Relation Extraction Datasets in the Digital Humanities Domain and their Evaluation with Word Embeddings

In this research, we manually create high-quality datasets in the digita...
research
01/10/2017

Implicitly Incorporating Morphological Information into Word Embedding

In this paper, we propose three novel models to enhance word embedding b...
research
06/10/2015

Unveiling the Dreams of Word Embeddings: Towards Language-Driven Image Generation

We introduce language-driven image generation, the task of generating an...
research
04/19/2018

Utilizing Neural Networks and Linguistic Metadata for Early Detection of Depression Indications in Text Sequences

Depression is ranked as the largest contributor to global disability and...
research
04/05/2017

Linear Ensembles of Word Embedding Models

This paper explores linear methods for combining several word embedding ...
research
10/04/2016

Chinese Event Extraction Using DeepNeural Network with Word Embedding

A lot of prior work on event extraction has exploited a variety of featu...
research
02/24/2022

First is Better Than Last for Training Data Influence

The ability to identify influential training examples enables us to debu...

Please sign up or login with your details

Forgot password? Click here to reset