Revisiting DocRED – Addressing the Overlooked False Negative Problem in Relation Extraction

05/25/2022
by   Qingyu Tan, et al.
0

The DocRED dataset is one of the most popular and widely used benchmarks for document-level relation extraction (RE). It adopts a recommend-revise annotation scheme so as to have a large-scale annotated dataset. However, we find that the annotation of DocRED is incomplete, i.e., the false negative samples are prevalent. We analyze the causes and effects of the overwhelming false negative problem in the DocRED dataset. To address the shortcoming, we re-annotate 4,053 documents in the DocRED dataset by adding the missed relation triples back to the original DocRED. We name our revised DocRED dataset Re-DocRED. We conduct extensive experiments with state-of-the-art neural models on both datasets, and the experimental results show that the models trained and evaluated on our Re-DocRED achieve performance improvements of around 13 F1 points. Moreover, we propose different metrics to comprehensively evaluate the document-level RE task. We make our data publicly available at https://github.com/tonytan48/Re-DocRED.

READ FULL TEXT
research
04/17/2022

Does Recommend-Revise Produce Reliable Annotations? An Analysis on Missing Instances in DocRED

DocRED is a widely used dataset for document-level relation extraction. ...
research
06/20/2023

Did the Models Understand Documents? Benchmarking Models for Language Understanding in Document-Level Relation Extraction

Document-level relation extraction (DocRE) attracts more research intere...
research
06/16/2023

Class-Adaptive Self-Training for Relation Extraction with Incompletely Annotated Training Data

Relation extraction (RE) aims to extract relations from sentences and do...
research
05/21/2021

Revisiting the Negative Data of Distantly Supervised Relation Extraction

Distantly supervision automatically generates plenty of training samples...
research
04/03/2023

Towards Integration of Discriminability and Robustness for Document-Level Relation Extraction

Document-level relation extraction (DocRE) predicts relations for entity...
research
10/21/2022

SimANS: Simple Ambiguous Negatives Sampling for Dense Text Retrieval

Sampling proper negatives from a large document pool is vital to effecti...
research
07/10/2023

HistRED: A Historical Document-Level Relation Extraction Dataset

Despite the extensive applications of relation extraction (RE) tasks in ...

Please sign up or login with your details

Forgot password? Click here to reset