Distant IE by Bootstrapping Using Lists and Document Structure

01/04/2016
by   Lidong Bing, et al.
0

Distant labeling for information extraction (IE) suffers from noisy training data. We describe a way of reducing the noise associated with distant IE by identifying coupling constraints between potential instance labels. As one example of coupling, items in a list are likely to have the same label. A second example of coupling comes from analysis of document structure: in some corpora, sections can be identified such that items in the same section are likely to have the same label. Such sections do not exist in all corpora, but we show that augmenting a large corpus with coupling constraints from even a small, well-structured corpus can improve performance substantially, doubling F1 on one task.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/10/2016

Bootstrapping Distantly Supervised IE using Joint Learning and Small Well-structured Corpora

We propose a framework to improve performance of distantly-supervised re...
research
06/22/2021

SENT: Sentence-level Distant Relation Extraction via Negative Training

Distant supervision for relation extraction provides uniform bag labels ...
research
11/22/2019

Are Noisy Sentences Useless for Distant Supervised Relation Extraction?

The noisy labeling problem has been one of the major obstacles for dista...
research
02/25/2021

ANEA: Distant Supervision for Low-Resource Named Entity Recognition

Distant supervision allows obtaining labeled training corpora for low-re...
research
11/19/2015

Knowledge Base Population using Semantic Label Propagation

A crucial aspect of a knowledge base population system that extracts new...
research
01/24/2021

Analysing the Noise Model Error for Realistic Noisy Label Data

Distant and weak supervision allow to obtain large amounts of labeled tr...

Please sign up or login with your details

Forgot password? Click here to reset