Extreme Extraction: Only One Hour per Relation

by   Raphael Hoffmann, et al.
University of Washington

Information Extraction (IE) aims to automatically generate a large knowledge base from natural language text, but progress remains slow. Supervised learning requires copious human annotation, while unsupervised and weakly supervised approaches do not deliver competitive accuracy. As a result, most fielded applications of IE, as well as the leading TAC-KBP systems, rely on significant amounts of manual engineering. Even "Extreme" methods, such as those reported in Freedman et al. 2011, require about 10 hours of expert labor per relation. This paper shows how to reduce that effort by an order of magnitude. We present a novel system, InstaRead, that streamlines authoring with an ensemble of methods: 1) encoding extraction rules in an expressive and compositional representation, 2) guiding the user to promising rules based on corpus statistics and mined resources, and 3) introducing a new interactive development cycle that provides immediate feedback --- even on large datasets. Experiments show that experts can create quality extractors in under an hour and even NLP novices can author good extractors. These extractors equal or outperform ones obtained by comparably supervised and state-of-the-art distantly supervised approaches.


page 1

page 2

page 3

page 4


PRBoost: Prompt-Based Rule Discovery and Boosting for Interactive Weakly-Supervised Learning

Weakly-supervised learning (WSL) has shown promising results in addressi...

Weakly-supervised Relation Extraction by Pattern-enhanced Embedding Learning

Extracting relations from text corpora is an important task in text mini...

MedDistant19: A Challenging Benchmark for Distantly Supervised Biomedical Relation Extraction

Relation Extraction in the biomedical domain is challenging due to the l...

A Review on Semi-Supervised Relation Extraction

Relation extraction (RE) plays an important role in extracting knowledge...

SeqGenSQL – A Robust Sequence Generation Model for Structured Query Language

We explore using T5 (Raffel et al. (2019)) to directly translate natural...

A New Benchmark and Progress Toward Improved Weakly Supervised Learning

Knowledge Matters: Importance of Prior Information for Optimization [7],...

Populating Web Scale Knowledge Graphs using Distantly Supervised Relation Extraction and Validation

In this paper, we propose a fully automated system to extend knowledge g...

Please sign up or login with your details

Forgot password? Click here to reset