Entity Extraction from Wikipedia List Pages

03/11/2020
by   Nicolas Heist, et al.
0

When it comes to factual knowledge about a wide range of domains, Wikipedia is often the prime source of information on the web. DBpedia and YAGO, as large cross-domain knowledge graphs, encode a subset of that knowledge by creating an entity for each page in Wikipedia, and connecting them through edges. It is well known, however, that Wikipedia-based knowledge graphs are far from complete. Especially, as Wikipedia's policies permit pages about subjects only if they have a certain popularity, such graphs tend to lack information about less well-known entities. Information about these entities is oftentimes available in the encyclopedia, but not represented as an individual page. In this paper, we present a two-phased approach for the extraction of entities from Wikipedia's list pages, which have proven to serve as a valuable source of information. In the first phase, we build a large taxonomy from categories and list pages with DBpedia as a backbone. With distant supervision, we extract training data for the identification of new entities in list pages that we use in the second phase to train a classification model. With this approach we extract over 700k new entities and extend DBpedia with 7.5M new type statements and 3.8M new facts of high precision.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/30/2017

Automated News Suggestions for Populating Wikipedia Entity Pages

Wikipedia entity pages are a valuable source of information for direct c...
research
10/11/2021

The CaLiGraph Ontology as a Challenge for OWL Reasoners

CaLiGraph is a large-scale cross-domain knowledge graph generated from W...
research
11/04/2015

Transforming Wikipedia into an Ontology-based Information Retrieval Search Engine for Local Experts using a Third-Party Taxonomy

Wikipedia is widely used for finding general information about a wide va...
research
10/04/2022

Transformer-based Subject Entity Detection in Wikipedia Listings

In tasks like question answering or text summarisation, it is essential ...
research
03/20/2019

A Graph-structured Dataset for Wikipedia Research

Wikipedia is a rich and invaluable source of information. Its central pl...
research
04/24/2017

Recognizing Descriptive Wikipedia Categories for Historical Figures

Wikipedia is a useful knowledge source that benefits many applications i...
research
06/11/2018

WikiRef: Wikilinks as a route to recommending appropriate references for scientific Wikipedia pages

The exponential increase in the usage of Wikipedia as a key source of sc...

Please sign up or login with your details

Forgot password? Click here to reset