Second-Order Word Embeddings from Nearest Neighbor Topological Features

05/23/2017
by   Denis Newman-Griffis, et al.
0

We introduce second-order vector representations of words, induced from nearest neighborhood topological features in pre-trained contextual word embeddings. We then analyze the effects of using second-order embeddings as input features in two deep natural language processing models, for named entity recognition and recognizing textual entailment, as well as a linear model for paraphrase recognition. Surprisingly, we find that nearest neighbor information alone is sufficient to capture most of the performance benefits derived from using pre-trained word embeddings. Furthermore, second-order embeddings are able to handle highly heterogeneous data better than first-order representations, though at the cost of some specificity. Additionally, augmenting contextual embeddings with second-order information further improves model performance in some cases. Due to variance in the random initializations of word embeddings, utilizing nearest neighbor features from multiple first-order embedding samples can also contribute to downstream performance gains. Finally, we identify intriguing characteristics of second-order embedding spaces for further research, including much higher density and different semantic interpretations of cosine similarity.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/10/2019

Better Word Embeddings by Disentangling Contextual n-Gram Information

Pre-trained word vectors are ubiquitous in Natural Language Processing a...
research
02/29/2020

Understanding the Downstream Instability of Word Embeddings

Many industrial machine learning (ML) systems require frequent retrainin...
research
06/23/2021

Clinical Named Entity Recognition using Contextualized Token Representations

The clinical named entity recognition (CNER) task seeks to locate and cl...
research
10/01/2019

Improved Word Sense Disambiguation Using Pre-Trained Contextualized Word Representations

Contextualized word representations are able to give different represent...
research
08/31/2020

Discovering Bilingual Lexicons in Polyglot Word Embeddings

Bilingual lexicons and phrase tables are critical resources for modern M...
research
11/29/2020

Improved Semantic Role Labeling using Parameterized Neighborhood Memory Adaptation

Deep neural models achieve some of the best results for semantic role la...
research
06/23/2020

Supervised Understanding of Word Embeddings

Pre-trained word embeddings are widely used for transfer learning in nat...

Please sign up or login with your details

Forgot password? Click here to reset