SciCo: Hierarchical Cross-Document Coreference for Scientific Concepts

04/18/2021 ∙ by Arie Cattan, et al. ∙ 11

Determining coreference of concept mentions across multiple documents is fundamental for natural language understanding. Work on cross-document coreference resolution (CDCR) typically considers mentions of events in the news, which do not often involve abstract technical concepts that are prevalent in science and technology. These complex concepts take diverse or ambiguous forms and have many hierarchical levels of granularity (e.g., tasks and subtasks), posing challenges for CDCR. We present a new task of hierarchical CDCR for concepts in scientific papers, with the goal of jointly inferring coreference clusters and hierarchy between them. We create SciCo, an expert-annotated dataset for this task, which is 3X larger than the prominent ECB+ resource. We find that tackling both coreference and hierarchy at once outperforms disjoint models, which we hope will spur development of joint models for SciCo.



There are no comments yet.


page 1

page 4

Code Repositories

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.