Large-scale Evaluation of Transformer-based Article Encoders on the Task of Citation Recommendation

09/12/2022
by   Zoran Medić, et al.
0

Recently introduced transformer-based article encoders (TAEs) designed to produce similar vector representations for mutually related scientific articles have demonstrated strong performance on benchmark datasets for scientific article recommendation. However, the existing benchmark datasets are predominantly focused on single domains and, in some cases, contain easy negatives in small candidate pools. Evaluating representations on such benchmarks might obscure the realistic performance of TAEs in setups with thousands of articles in candidate pools. In this work, we evaluate TAEs on large benchmarks with more challenging candidate pools. We compare the performance of TAEs with a lexical retrieval baseline model BM25 on the task of citation recommendation, where the model produces a list of recommendations for citing in a given input article. We find out that BM25 is still very competitive with the state-of-the-art neural retrievers, a finding which is surprising given the strong performance of TAEs on small benchmarks. As a remedy for the limitations of the existing benchmarks, we propose a new benchmark dataset for evaluating scientific article representations: Multi-Domain Citation Recommendation dataset (MDCR), which covers different scientific fields and contains challenging candidate pools.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/17/2020

Citation Recommendation: Approaches and Datasets

Citation recommendation describes the task of recommending citations for...
research
01/23/2020

Navigation-Based Candidate Expansion and Pretrained Language Models for Citation Recommendation

Citation recommendation systems for the scientific literature, to help a...
research
09/08/2023

Encoding Multi-Domain Scientific Papers by Ensembling Multiple CLS Tokens

Many useful tasks on scientific documents, such as topic classification ...
research
03/06/2021

ReadNet: A Hierarchical Transformer Framework for Web Article Readability Analysis

Analyzing the readability of articles has been an important sociolinguis...
research
07/08/2020

Learning Neural Textual Representations for Citation Recommendation

With the rapid growth of the scientific literature, manually selecting a...
research
02/25/2015

Topic-adjusted visibility metric for scientific articles

Measuring the impact of scientific articles is important for evaluating ...
research
04/30/2022

SciEv: Finding Scientific Evidence Papers for Scientific News

In the past decade, many scientific news media that report scientific br...

Please sign up or login with your details

Forgot password? Click here to reset