MultiCite: Modeling realistic citations requires moving beyond the single-sentence single-label setting

07/01/2021
by   Anne Lauscher, et al.
3

Citation context analysis (CCA) is an important task in natural language processing that studies how and why scholars discuss each others' work. Despite being studied for decades, traditional frameworks for CCA have largely relied on overly-simplistic assumptions of how authors cite, which ignore several important phenomena. For instance, scholarly papers often contain rich discussions of cited work that span multiple sentences and express multiple intents concurrently. Yet, CCA is typically approached as a single-sentence, single-label classification task, and thus existing datasets fail to capture this interesting discourse. In our work, we address this research gap by proposing a novel framework for CCA as a document-level context extraction and labeling task. We release MultiCite, a new dataset of 12,653 citation contexts from over 1,200 computational linguistics papers. Not only is it the largest collection of expert-annotated citation contexts to-date, MultiCite contains multi-sentence, multi-label citation contexts within full paper texts. Finally, we demonstrate how our dataset, while still usable for training classic CCA models, also supports the development of new types of models for CCA beyond fixed-width text classification. We release our code and dataset at https://github.com/allenai/multicite.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/02/2016

Citation Classification for Behavioral Analysis of a Scientific Field

Citations are an important indicator of the state of a scientific field,...
research
06/12/2017

Scientific document summarization via citation contextualization and scientific discourse

The rapid growth of scientific literature has made it difficult for the ...
research
08/17/2021

ACM-CR: A Manually Annotated Test Collection for Citation Recommendation

Citation recommendation is intended to assist researchers in the process...
research
04/02/2019

Structural Scaffolds for Citation Intent Classification in Scientific Publications

Identifying the intent of a citation in scientific papers (e.g., backgro...
research
12/02/2021

Towards Generating Citation Sentences for Multiple References with Intent Control

Machine-generated citation sentences can aid automated scientific litera...
research
05/07/2022

CORWA: A Citation-Oriented Related Work Annotation Dataset

Academic research is an exploratory activity to discover new solutions t...
research
02/06/2020

Citation Data of Czech Apex Courts

In this paper, we introduce the citation data of the Czech apex courts (...

Please sign up or login with your details

Forgot password? Click here to reset