Pre-trained Language Models as Re-Annotators

05/11/2022
by   Chang Shu, et al.
2

Annotation noise is widespread in datasets, but manually revising a flawed corpus is time-consuming and error-prone. Hence, given the prior knowledge in Pre-trained Language Models and the expected uniformity across all annotations, we attempt to reduce annotation noise in the corpus through two tasks automatically: (1) Annotation Inconsistency Detection that indicates the credibility of annotations, and (2) Annotation Error Correction that rectifies the abnormal annotations. We investigate how to acquire semantic sensitive annotation representations from Pre-trained Language Models, expecting to embed the examples with identical annotations to the mutually adjacent positions even without fine-tuning. We proposed a novel credibility score to reveal the likelihood of annotation inconsistencies based on the neighbouring consistency. Then, we fine-tune the Pre-trained Language Models based classifier with cross-validation for annotation correction. The annotation corrector is further elaborated with two approaches: (1) soft labelling by Kernel Density Estimation and (2) a novel distant-peer contrastive loss. We study the re-annotation in relation extraction and create a new manually revised dataset, Re-DocRED, for evaluating document-level re-annotation. The proposed credibility scores show promising agreement with human revisions, achieving a Binary F1 of 93.4 and 72.5 in detecting inconsistencies on TACRED and DocRED respectively. Moreover, the neighbour-aware classifiers based on distant-peer contrastive learning and uncertain labels achieve Macro F1 up to 66.2 and 57.8 in correcting annotations on TACRED and DocRED respectively. These improvements are not merely theoretical: Rather, automatically denoised training sets demonstrate up to 3.6 state-of-the-art relation extraction models.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/01/2019

Deep Bidirectional Transformers for Relation Extraction without Supervision

We present a novel framework to deal with relation extraction tasks in c...
research
06/19/2019

Fine-tuning Pre-Trained Transformer Language Models to Distantly Supervised Relation Extraction

Distantly supervised relation extraction is widely used to extract relat...
research
06/07/2019

Improving Relation Extraction by Pre-trained Language Representations

Current state-of-the-art relation extraction methods typically rely on a...
research
07/31/2022

Improving Distantly Supervised Relation Extraction by Natural Language Inference

To reduce human annotations for relation extraction (RE) tasks, distantl...
research
02/12/2023

Discourse Structure Extraction from Pre-Trained and Fine-Tuned Language Models in Dialogues

Discourse processing suffers from data sparsity, especially for dialogue...
research
08/26/2022

GRASP: Guiding model with RelAtional Semantics using Prompt for Dialogue Relation Extraction

The dialogue-based relation extraction (DialogRE) task aims to predict t...
research
08/05/2022

Construction of English Resume Corpus and Test with Pre-trained Language Models

Information extraction(IE) has always been one of the essential tasks of...

Please sign up or login with your details

Forgot password? Click here to reset