OPIEC: An Open Information Extraction Corpus

04/28/2019
by   Kiril Gashteovski, et al.
0

Open information extraction (OIE) systems extract relations and their arguments from natural language text in an unsupervised manner. The resulting extractions are a valuable resource for downstream tasks such as knowledge base construction, open question answering, or event schema induction. In this paper, we release, describe, and analyze an OIE corpus called OPIEC, which was extracted from the text of English Wikipedia. OPIEC complements the available OIE resources: It is the largest OIE corpus publicly available to date (over 340M triples) and contains valuable metadata such as provenance information, confidence scores, linguistic annotations, and semantic annotations including spatial and temporal information. We analyze the OPIEC corpus by comparing its content with knowledge bases such as DBpedia or YAGO, which are also based on Wikipedia. We found that most of the facts between entities present in OPIEC cannot be found in DBpedia and/or YAGO, that OIE facts often differ in the level of specificity compared to knowledge base facts, and that OIE open relations are generally highly polysemous. We believe that the OPIEC corpus is a valuable resource for future research on automated knowledge base construction.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/13/2022

mOKB6: A Multilingual Open Knowledge Base Completion Benchmark

Automated completion of open knowledge bases (KBs), which are constructe...
research
12/21/2022

ImPaKT: A Dataset for Open-Schema Knowledge Base Construction

Large language models have ushered in a golden age of semantic parsing. ...
research
10/03/2022

Russian Web Tables: A Public Corpus of Web Tables for Russian Language Based on Wikipedia

Corpora that contain tabular data such as WebTables are a vital resource...
research
02/15/2018

Open Information Extraction on Scientific Text: An Evaluation

Open Information Extraction (OIE) is the task of the unsupervised creati...
research
07/10/2018

Enriching Knowledge Bases with Counting Quantifiers

Information extraction traditionally focuses on extracting relations bet...
research
06/12/2020

Do Dogs have Whiskers? A New Knowledge Base of hasPart Relations

We present a new knowledge-base of hasPart relationships, extracted from...
research
03/17/2018

Tell Me Why Is It So? Explaining Knowledge Graph Relationships by Finding Descriptive Support Passages

We address the problem of finding descriptive explanations of facts stor...

Please sign up or login with your details

Forgot password? Click here to reset