Better Word Embeddings by Disentangling Contextual n-Gram Information

04/10/2019
by   Prakhar Gupta, et al.
0

Pre-trained word vectors are ubiquitous in Natural Language Processing applications. In this paper, we show how training word embeddings jointly with bigram and even trigram embeddings, results in improved unigram embeddings. We claim that training word embeddings along with higher n-gram embeddings helps in the removal of the contextual information from the unigrams, resulting in better stand-alone word embeddings. We empirically show the validity of our hypothesis by outperforming other competing word representation models by a significant margin on a wide variety of tasks. We make our models publicly available.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/30/2020

Development of Word Embeddings for Uzbek Language

In this paper, we share the process of developing word embeddings for th...
research
01/14/2020

Balancing the composition of word embeddings across heterogenous data sets

Word embeddings capture semantic relationships based on contextual infor...
research
08/25/2019

On Measuring and Mitigating Biased Inferences of Word Embeddings

Word embeddings carry stereotypical connotations from the text they are ...
research
06/08/2021

Obtaining Better Static Word Embeddings Using Contextual Embedding Models

The advent of contextual word embeddings – representations of words whic...
research
04/19/2017

Predicting Role Relevance with Minimal Domain Expertise in a Financial Domain

Word embeddings have made enormous inroads in recent years in a wide var...
research
05/23/2017

Second-Order Word Embeddings from Nearest Neighbor Topological Features

We introduce second-order vector representations of words, induced from ...
research
12/19/2017

Any-gram Kernels for Sentence Classification: A Sentiment Analysis Case Study

Any-gram kernels are a flexible and efficient way to employ bag-of-n-gra...

Please sign up or login with your details

Forgot password? Click here to reset