Cross-Document Event Coreference Resolution Beyond Corpus-Tailored Systems

11/24/2020
by   Michael Bugert, et al.
0

Cross-document event coreference resolution (CDCR) is an NLP task in which mentions of events need to be identified and clustered throughout a collection of documents. CDCR aims to benefit downstream multi-document applications, but despite recent progress on corpora and model development, downstream improvements from applying CDCR have not been shown yet. The reason lies in the fact that every CDCR system released to date was developed, trained, and tested only on a single respective corpus. This raises strong concerns on their generalizability — a must-have for downstream applications where the magnitude of domains or event mentions is likely to exceed those found in a curated corpus. To approach this issue, we define a uniform evaluation setup involving three CDCR corpora: ECB+, the Gun Violence Corpus and the Football Coreference Corpus (which we reannotate on token level to make our analysis possible). We compare a corpus-independent, feature-based system against a recent neural system developed for ECB+. Whilst being inferior in absolute numbers, the feature-based system shows more consistent performance across all corpora whereas the neural system is hit-and-miss. Via model introspection, we find that the importance of event actions, event time, etc. for resolving coreference in practice varies greatly between the corpora. Additional analysis shows that several systems overfit on the structure of the ECB+ corpus. We conclude with recommendations on how to move beyond corpus-tailored CDCR systems in the future – the most important being that evaluation on multiple CDCR corpora is strongly necessary. To facilitate future research, we release our dataset, annotation guidelines, and model implementation to the public.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/10/2021

Cross-Corpora Language Recognition: A Preliminary Investigation with Indian Languages

In this paper, we conduct one of the very first studies for cross-corpor...
research
04/11/2021

WEC: Deriving a Large-scale Cross-document Event Coreference dataset from Wikipedia

Cross-document event coreference resolution is a foundational task for N...
research
06/04/2019

Revisiting Joint Modeling of Cross-document Entity and Event Coreference Resolution

Recognizing coreferring events and entities across multiple texts is cru...
research
01/11/2023

tieval: An Evaluation Framework for Temporal Information Extraction Systems

Temporal information extraction (TIE) has attracted a great deal of inte...
research
02/20/2018

TAP-DLND 1.0 : A Corpus for Document Level Novelty Detection

Detecting novelty of an entire document is an Artificial Intelligence (A...
research
09/23/2020

Streamlining Cross-Document Coreference Resolution: Evaluation and Modeling

Recent evaluation protocols for Cross-document (CD) coreference resoluti...
research
10/09/2022

Noise-Robust De-Duplication at Scale

Identifying near duplicates within large, noisy text corpora has a myria...

Please sign up or login with your details

Forgot password? Click here to reset