SChuBERT: Scholarly Document Chunks with BERT-encoding boost Citation Count Prediction

12/21/2020
by   Thomas van Dongen, et al.
0

Predicting the number of citations of scholarly documents is an upcoming task in scholarly document processing. Besides the intrinsic merit of this information, it also has a wider use as an imperfect proxy for quality which has the advantage of being cheaply available for large volumes of scholarly documents. Previous work has dealt with number of citations prediction with relatively small training data sets, or larger datasets but with short, incomplete input text. In this work we leverage the open access ACL Anthology collection in combination with the Semantic Scholar bibliometric database to create a large corpus of scholarly documents with associated citation information and we propose a new citation prediction model called SChuBERT. In our experiments we compare SChuBERT with several state-of-the-art citation prediction models and show that it outperforms previous methods by a large margin. We also show the merit of using more training data and longer input for number of citations prediction.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/10/2020

Longitudinal Citation Prediction using Temporal Graph Neural Networks

Citation count prediction is the task of predicting the number of citati...
research
09/09/2019

Follow the Leader: Documents on the Leading Edge of Semantic Change Get More Citations

Diachronic word embeddings offer remarkable insights into the evolution ...
research
09/25/2020

Virtual Proximity Citation (VCP): A Supervised Deep Learning Method to Relate Uncited Papers On Grounds of Citation Proximity

Citation based approaches have seen good progress for recommending resea...
research
03/03/2022

LegalVis: Exploring and Inferring Precedent Citations in Legal Documents

To reduce the number of pending cases and conflicting rulings in the Bra...
research
06/10/2019

What Do Citation Counts Measure? An Updated Review of Studies on Citations in Scientific Documents Published between 2006 and 2018

The purpose of this paper is to update the review of Bornmann and Daniel...
research
06/27/2019

OpenCitations, an infrastructure organization for open scholarship

OpenCitations is an infrastructure organization for open scholarship ded...
research
05/24/2023

Enabling Large Language Models to Generate Text with Citations

Large language models (LLMs) have emerged as a widely-used tool for info...

Please sign up or login with your details

Forgot password? Click here to reset