SWAT: A System for Detecting Salient Wikipedia Entities in Texts

04/10/2018
by   Marco Ponza, et al.
0

We study the problem of entity salience by proposing the design and implementation of SWAT, a system that identifies the salient Wikipedia entities occurring in an input document. SWAT consists of several modules that are able to detect and classify on-the-fly Wikipedia entities as salient or not, based on a large number of syntactic, semantic and latent features properly extracted via a supervised process which has been trained over millions of examples drawn from the New York Times corpus. The validation process is performed through a large experimental assessment, eventually showing that SWAT improves known solutions over all publicly available datasets. We release SWAT via an API that we describe and comment in the paper in order to ease its use in other software.

READ FULL TEXT

page 13

page 14

page 15

research
05/06/2017

Learning Distributed Representations of Texts and Entities from Knowledge Base

We describe a neural network model that jointly learns distributed repre...
research
06/04/2019

Boosting Entity Linking Performance by Leveraging Unlabeled Documents

Modern entity linking systems rely on large collections of documents spe...
research
02/10/2021

Information Extraction From Co-Occurring Similar Entities

Knowledge about entities and their interrelations is a crucial factor of...
research
09/01/2021

Pattern-based Acquisition of Scientific Entities from Scholarly Article Titles

We describe a rule-based approach for the automatic acquisition of salie...
research
08/21/2023

Software Entity Recognition with Noise-Robust Learning

Recognizing software entities such as library names from free-form text ...
research
02/22/2023

Open-domain Visual Entity Recognition: Towards Recognizing Millions of Wikipedia Entities

Large-scale multi-modal pre-training models such as CLIP and PaLI exhibi...
research
02/03/2017

Insights into Entity Name Evolution on Wikipedia

Working with Web archives raises a number of issues caused by their temp...

Please sign up or login with your details

Forgot password? Click here to reset