Re-TACRED: Addressing Shortcomings of the TACRED Dataset

by   George Stoica, et al.

TACRED is one of the largest and most widely used sentence-level relation extraction datasets. Proposed models that are evaluated using this dataset consistently set new state-of-the-art performance. However, they still exhibit large error rates despite leveraging external knowledge and unsupervised pretraining on large text corpora. A recent study suggested that this may be due to poor dataset quality. The study observed that over 50 challenging sentences from the development and test sets are incorrectly labeled and account for an average drop of 8 However, this study was limited to a small biased sample of 5k (out of a total of 106k) sentences, substantially restricting the generalizability and broader implications of its findings. In this paper, we address these shortcomings by: (i) performing a comprehensive study over the whole TACRED dataset, (ii) proposing an improved crowdsourcing strategy and deploying it to re-annotate the whole dataset, and (iii) performing a thorough analysis to understand how correcting the TACRED annotations affects previously published results. After verification, we observed that 23.9 evaluating several models on our revised dataset yields an average f1-score improvement of 14.3 different models (rather than simply offsetting or scaling their scores by a constant factor). Finally, aside from our analysis we also release Re-TACRED, a new completely re-annotated version of the TACRED dataset that can be used to perform reliable evaluation of relation extraction models.


page 1

page 2

page 3

page 4


TACRED Revisited: A Thorough Evaluation of the TACRED Relation Extraction Task

TACRED (Zhang et al., 2017) is one of the largest, most widely used crow...

Relation Extraction with Explanation

Recent neural models for relation extraction with distant supervision al...

Towards Accurate and Consistent Evaluation: A Dataset for Distantly-Supervised Relation Extraction

In recent years, distantly-supervised relation extraction has achieved a...

DiS-ReX: A Multilingual Dataset for Distantly Supervised Relation Extraction

Distant supervision (DS) is a well established technique for creating la...

Revisiting DocRED – Addressing the Overlooked False Negative Problem in Relation Extraction

The DocRED dataset is one of the most popular and widely used benchmarks...

Semi-Automated Labeling of Requirement Datasets for Relation Extraction

Creating datasets manually by human annotators is a laborious task that ...

What do You Mean by Relation Extraction? A Survey on Datasets and Study on Scientific Relation Classification

Over the last five years, research on Relation Extraction (RE) witnessed...

Code Repositories