Did the Models Understand Documents? Benchmarking Models for Language Understanding in Document-Level Relation Extraction

06/20/2023
by   Haotian Chen, et al.
0

Document-level relation extraction (DocRE) attracts more research interest recently. While models achieve consistent performance gains in DocRE, their underlying decision rules are still understudied: Do they make the right predictions according to rationales? In this paper, we take the first step toward answering this question and then introduce a new perspective on comprehensively evaluating a model. Specifically, we first conduct annotations to provide the rationales considered by humans in DocRE. Then, we conduct investigations and reveal the fact that: In contrast to humans, the representative state-of-the-art (SOTA) models in DocRE exhibit different decision rules. Through our proposed RE-specific attacks, we next demonstrate that the significant discrepancy in decision rules between models and humans severely damages the robustness of models and renders them inapplicable to real-world RE scenarios. After that, we introduce mean average precision (MAP) to evaluate the understanding and reasoning capabilities of models. According to the extensive experimental results, we finally appeal to future work to consider evaluating both performance and the understanding ability of models for the development of their applications. We make our annotations and code publicly available.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/25/2022

Revisiting DocRED – Addressing the Overlooked False Negative Problem in Relation Extraction

The DocRED dataset is one of the most popular and widely used benchmarks...
research
11/17/2021

Multi-Attribute Relation Extraction (MARE) – Simplifying the Application of Relation Extraction

Natural language understanding's relation extraction makes innovative an...
research
06/15/2023

Rethinking Document-Level Relation Extraction: A Reality Check

Recently, numerous efforts have continued to push up performance boundar...
research
04/09/2023

Are Large Language Models Ready for Healthcare? A Comparative Study on Clinical Language Understanding

Large language models (LLMs) have made significant progress in various d...
research
07/10/2023

HistRED: A Historical Document-Level Relation Extraction Dataset

Despite the extensive applications of relation extraction (RE) tasks in ...
research
12/14/2020

Primer AI's Systems for Acronym Identification and Disambiguation

The prevalence of ambiguous acronyms make scientific documents harder to...
research
04/28/2023

Information Redundancy and Biases in Public Document Information Extraction Benchmarks

Advances in the Visually-rich Document Understanding (VrDU) field and pa...

Please sign up or login with your details

Forgot password? Click here to reset