EnCore: Pre-Training Entity Encoders using Coreference Chains

05/22/2023
by   Frank Mtumbuka, et al.
0

Entity typing is the task of assigning semantic types to the entities that are mentioned in a text. Since obtaining sufficient amounts of manual annotations is expensive, current state-of-the-art methods are typically trained on automatically labelled datasets, e.g. by exploiting links between Wikipedia pages. In this paper, we propose to use coreference chains as an additional supervision signal. Specifically, we pre-train an entity encoder using a contrastive loss, such that entity embeddings of coreferring entities are more similar to each other than to the embeddings of other entities. Since this strategy is not tied to Wikipedia, we can pre-train our entity encoder on other genres than encyclopedic text and on larger amounts of data. Our experimental results show that the proposed pre-training strategy allows us to improve the state-of-the-art in fine-grained entity typing, provided that only high-quality coreference links are exploited.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/01/2019

Pre-training of Deep Contextualized Embeddings of Words and Entities for Named Entity Disambiguation

Deep contextualized embeddings trained using unsupervised language model...
research
04/15/2021

Planning with Entity Chains for Abstractive Summarization

Pre-trained transformer-based sequence-to-sequence models have become th...
research
05/02/2023

KEPLET: Knowledge-Enhanced Pretrained Language Model with Topic Entity Awareness

In recent years, Pre-trained Language Models (PLMs) have shown their sup...
research
09/14/2023

MMEAD: MS MARCO Entity Annotations and Disambiguations

MMEAD, or MS MARCO Entity Annotations and Disambiguations, is a resource...
research
11/23/2018

Fine Grained Classification of Personal Data Entities

Entity Type Classification can be defined as the task of assigning categ...
research
08/21/2023

Software Entity Recognition with Noise-Robust Learning

Recognizing software entities such as library names from free-form text ...
research
02/22/2023

Open-domain Visual Entity Recognition: Towards Recognizing Millions of Wikipedia Entities

Large-scale multi-modal pre-training models such as CLIP and PaLI exhibi...

Please sign up or login with your details

Forgot password? Click here to reset