Extreme Extraction: Only One Hour per Relation

06/21/2015
by   Raphael Hoffmann, et al.
0

Information Extraction (IE) aims to automatically generate a large knowledge base from natural language text, but progress remains slow. Supervised learning requires copious human annotation, while unsupervised and weakly supervised approaches do not deliver competitive accuracy. As a result, most fielded applications of IE, as well as the leading TAC-KBP systems, rely on significant amounts of manual engineering. Even "Extreme" methods, such as those reported in Freedman et al. 2011, require about 10 hours of expert labor per relation. This paper shows how to reduce that effort by an order of magnitude. We present a novel system, InstaRead, that streamlines authoring with an ensemble of methods: 1) encoding extraction rules in an expressive and compositional representation, 2) guiding the user to promising rules based on corpus statistics and mined resources, and 3) introducing a new interactive development cycle that provides immediate feedback --- even on large datasets. Experiments show that experts can create quality extractors in under an hour and even NLP novices can author good extractors. These extractors equal or outperform ones obtained by comparably supervised and state-of-the-art distantly supervised approaches.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/18/2022

PRBoost: Prompt-Based Rule Discovery and Boosting for Interactive Weakly-Supervised Learning

Weakly-supervised learning (WSL) has shown promising results in addressi...
research
11/09/2017

Weakly-supervised Relation Extraction by Pattern-enhanced Embedding Learning

Extracting relations from text corpora is an important task in text mini...
research
04/10/2022

MedDistant19: A Challenging Benchmark for Distantly Supervised Biomedical Relation Extraction

Relation Extraction in the biomedical domain is challenging due to the l...
research
03/12/2021

A Review on Semi-Supervised Relation Extraction

Relation extraction (RE) plays an important role in extracting knowledge...
research
11/07/2020

SeqGenSQL – A Robust Sequence Generation Model for Structured Query Language

We explore using T5 (Raffel et al. (2019)) to directly translate natural...
research
06/30/2018

A New Benchmark and Progress Toward Improved Weakly Supervised Learning

Knowledge Matters: Importance of Prior Information for Optimization [7],...
research
08/21/2019

Populating Web Scale Knowledge Graphs using Distantly Supervised Relation Extraction and Validation

In this paper, we propose a fully automated system to extend knowledge g...

Please sign up or login with your details

Forgot password? Click here to reset