An Annotated Corpus of Webtables for Information Extraction Tasks

08/18/2020
by   Erin Macdonald, et al.
0

Information Extraction is a well-researched area of Natural Language Processing with applications in web search and question answering concerned with identifying entities and relationships between them as expressed in a given context, usually a sentence of a paragraph of running text. Given the importance of the task, several datasets and benchmarks have been curated over the years. However, focusing on running text alone leaves out tables which are common in many structured documents and in which pairs of entities also co-occur in context (e.g., the same row of the table). While there are recent papers on relation extraction from tables in the literature, their experimental evaluations have been on ad-hoc datasets for the lack of a standard benchmark. This paper helps close that gap. We introduce an annotation framework and a dataset of 217,834 tables from Wikipedia which are annotated with 28 relations, using both classifiers and carefully designed queries over a reference knowledge graph. Binary classifiers are then applied to the resulting dataset to remove false positives, resulting in an average annotation accuracy of 94 The resulting dataset is the first of its kind to be made publicly available.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/10/2021

Information Extraction From Co-Occurring Similar Entities

Knowledge about entities and their interrelations is a crucial factor of...
research
08/18/2021

RTE: A Tool for Annotating Relation Triplets from Text

In this work, we present a Web-based annotation tool `Relation Triplets ...
research
10/05/2020

TabEAno: Table to Knowledge Graph Entity Annotation

In the Open Data era, a large number of table resources have been made a...
research
05/02/2022

Biographical: A Semi-Supervised Relation Extraction Dataset

Extracting biographical information from online documents is a popular r...
research
08/24/2021

Relation Extraction from Tables using Artificially Generated Metadata

Relation Extraction (RE) from tables is the task of identifying relation...
research
07/11/2023

Relational Extraction on Wikipedia Tables using Convolutional and Memory Networks

Relation extraction (RE) is the task of extracting relations between ent...
research
07/17/2022

An Overview of Distant Supervision for Relation Extraction with a Focus on Denoising and Pre-training Methods

Relation Extraction (RE) is a foundational task of natural language proc...

Please sign up or login with your details

Forgot password? Click here to reset