Word Embeddings for Entity-annotated Texts

02/06/2019
by   Satya Almasian, et al.
0

Many information retrieval and natural language processing tasks due to their ability to capture lexical semantics. However, while many such tasks involve or even rely on named entities as central components, popular word embedding models have so far failed to include entities as first-class citizens. While it seems intuitive that annotating named entities in the training, corpus should result in more intelligent word features for downstream tasks, performance issues arise when popular embedding approaches are naively applied to entity annotated corpora. Not only are the resulting entity embeddings less useful than expected, but one also finds that the performance of the non-entity word embeddings degrades in comparison to those trained on the raw, unannotated corpus. In this paper, we investigate approaches to jointly train word and entity embeddings on a large corpus with automatically annotated and linked entities. We discuss two distinct approaches to the generation of such embeddings, namely the training of state-of-the-art embeddings on raw text and annotated versions of the corpus, as well as node embeddings of a co-occurrence graph representation of the annotated corpus. We compare the performance of annotated embeddings and classical word embeddings on a variety of word similarity, analogy, and clustering evaluation tasks, and investigate their performance in entity-specific tasks. Our findings show that it takes more than training popular word embedding models on an annotated corpus to create entity embeddings with acceptable performance on common test cases. Based on these results, we discuss how and when node embeddings of the co-occurrence graph representation of the text can restore the performance.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/06/2020

Graph-Embedding Empowered Entity Retrieval

In this research, we improve upon the current state of the art in entity...
research
05/22/2019

Retrieving Multi-Entity Associations: An Evaluation of Combination Modes for Word Embeddings

Word embeddings have gained significant attention as learnable represent...
research
07/09/2018

Jointly Embedding Entities and Text with Distant Supervision

Learning representations for knowledge base entities and concepts is bec...
research
03/04/2019

Using Word Embeddings for Visual Data Exploration with Ontodia and Wikidata

One of the big challenges in Linked Data consumption is to create visual...
research
06/28/2023

Social World Knowledge: Modeling and Applications

Social world knowledge is a key ingredient in effective communication an...
research
01/14/2020

Humpty Dumpty: Controlling Word Meanings via Corpus Poisoning

Word embeddings, i.e., low-dimensional vector representations such as Gl...
research
11/06/2019

Invariance and identifiability issues for word embeddings

Word embeddings are commonly obtained as optimizers of a criterion funct...

Please sign up or login with your details

Forgot password? Click here to reset