The Role of Context Types and Dimensionality in Learning Word Embeddings

01/05/2016
by   Oren Melamud, et al.
0

We provide the first extensive evaluation of how using different types of context to learn skip-gram word embeddings affects performance on a wide range of intrinsic and extrinsic NLP tasks. Our results suggest that while intrinsic tasks tend to exhibit a clear preference to particular types of contexts and higher dimensionality, more careful tuning is required for finding the optimal settings for most of the extrinsic tasks that we considered. Furthermore, for these extrinsic tasks, we find that once the benefit from increasing the embedding dimensionality is mostly exhausted, simple concatenation of word embeddings, learned with different context types, can yield further performance gains. As an additional contribution, we propose a new variant of the skip-gram model that learns word embeddings from weighted contexts of substitute words.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/27/2017

Analysis of Italian Word Embeddings

In this work we analyze the performances of two of the most used word em...
research
11/17/2015

Learning the Dimensionality of Word Embeddings

We describe a method for learning word embeddings with data-dependent di...
research
12/30/2020

kōan: A Corrected CBOW Implementation

It is a common belief in the NLP community that continuous bag-of-words ...
research
08/18/2020

Word2vec Skip-gram Dimensionality Selection via Sequential Normalized Maximum Likelihood

In this paper, we propose a novel information criteria-based approach to...
research
05/23/2019

Misspelling Oblivious Word Embeddings

In this paper we present a method to learn word embeddings that are resi...
research
07/11/2016

The Benefits of Word Embeddings Features for Active Learning in Clinical Information Extraction

This study investigates the use of unsupervised word embeddings and sequ...
research
04/19/2017

Predicting Role Relevance with Minimal Domain Expertise in a Financial Domain

Word embeddings have made enormous inroads in recent years in a wide var...

Please sign up or login with your details

Forgot password? Click here to reset