A Gold Standard Dataset for the Reviewer Assignment Problem

03/23/2023
by   Ivan Stelmakh, et al.
0

Many peer-review venues are either using or looking to use algorithms to assign submissions to reviewers. The crux of such automated approaches is the notion of the "similarity score"–a numerical estimate of the expertise of a reviewer in reviewing a paper–and many algorithms have been proposed to compute these scores. However, these algorithms have not been subjected to a principled comparison, making it difficult for stakeholders to choose the algorithm in an evidence-based manner. The key challenge in comparing existing algorithms and developing better algorithms is the lack of the publicly available gold-standard data that would be needed to perform reproducible research. We address this challenge by collecting a novel dataset of similarity scores that we release to the research community. Our dataset consists of 477 self-reported expertise scores provided by 58 researchers who evaluated their expertise in reviewing papers they have read previously. We use this data to compare several popular algorithms employed in computer science conferences and come up with recommendations for stakeholders. Our main findings are as follows. First, all algorithms make a non-trivial amount of error. For the task of ordering two papers in terms of their relevance for a reviewer, the error rates range from 12 cases, highlighting the vital need for more research on the similarity-computation problem. Second, most existing algorithms are designed to work with titles and abstracts of papers, and in this regime the Specter+MFR algorithm performs best. Third, to improve performance, it may be important to develop modern deep-learning based algorithms that can make use of the full texts of papers: the classical TD-IDF algorithm enhanced with full texts of papers is on par with the deep-learning based Specter+MFR that cannot make use of this information.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/02/2021

Ranking Scientific Papers Using Preference Learning

Peer review is the main quality control mechanism in academia. Quality o...
research
02/01/2021

The Harrington Yowlumne Narrative Corpus

Minority languages continue to lack adequate resources for their develop...
research
04/05/2022

Integrating Rankings into Quantized Scores in Peer Review

In peer review, reviewers are usually asked to provide scores for the pa...
research
06/24/2022

A Dataset on Malicious Paper Bidding in Peer Review

In conference peer review, reviewers are often asked to provide "bids" o...
research
02/09/2021

Making Paper Reviewing Robust to Bid Manipulation Attacks

Most computer science conferences rely on paper bidding to assign review...
research
11/09/2017

DLPaper2Code: Auto-generation of Code from Deep Learning Research Papers

With an abundance of research papers in deep learning, reproducibility o...
research
08/12/2019

The Role of Publicly Available Data in MICCAI Papers from 2014 to 2018

Widely-used public benchmarks are of huge importance to computer vision ...

Please sign up or login with your details

Forgot password? Click here to reset