Learning to Denoise Distantly-Labeled Data for Entity Typing

05/04/2019
by   Yasumasa Onoe, et al.
0

Distantly-labeled data can be used to scale up training of statistical models, but it is typically noisy and that noise can vary with the distant labeling technique. In this work, we propose a two-stage procedure for handling this type of data: denoise it with a learned model, then train our final model on clean and denoised distant data with standard supervised training. Our denoising approach consists of two parts. First, a filtering function discards examples from the distantly labeled data that are wholly unusable. Second, a relabeling function repairs noisy labels for the retained examples. Each of these components is a model trained on synthetically-noised examples generated from a small manually-labeled set. We investigate this approach on the ultra-fine entity typing task of Choi et al. (2018). Our baseline model is an extension of their model with pre-trained ELMo representations, which already achieves state-of-the-art performance. Adding distant data that has been denoised with our learned models gives further performance gains over this base model, outperforming models trained on raw distant data or heuristically-denoised distant data.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/10/2021

Distantly-Supervised Named Entity Recognition with Noise-Robust Learning and Language Model Augmented Self-Training

We study the problem of training named entity recognition (NER) models u...
research
06/28/2020

BOND: BERT-Assisted Open-Domain Named Entity Recognition with Distant Supervision

We study the open-domain named entity recognition (NER) problem under di...
research
10/26/2020

Meta-Learning for Neural Relation Classification with Distant Supervision

Distant supervision provides a means to create a large number of weakly ...
research
10/18/2022

Denoising Enhanced Distantly Supervised Ultrafine Entity Typing

Recently, the task of distantly supervised (DS) ultra-fine entity typing...
research
05/17/2019

Distant Learning for Entity Linking with Automatic Noise Detection

Accurate entity linkers have been produced for domains and languages whe...
research
09/25/2017

EZLearn: Exploiting Organic Supervision in Large-Scale Data Annotation

Many real-world applications require large-scale data annotation, such a...
research
05/24/2010

Distantly Labeling Data for Large Scale Cross-Document Coreference

Cross-document coreference, the problem of resolving entity mentions acr...

Please sign up or login with your details

Forgot password? Click here to reset