Understanding Performance of Long-Document Ranking Models through Comprehensive Evaluation and Leaderboarding

07/04/2022
by   Leonid Boytsov, et al.
0

We carry out a comprehensive evaluation of 13 recent models for ranking of long documents using two popular collections (MS MARCO documents and Robust04). Our model zoo includes two specialized Transformer models (such as Longformer) that can process long documents without the need to split them. Along the way, we document several difficulties regarding training and comparing such models. Somewhat surprisingly, we find the simple FirstP baseline (truncating documents to satisfy the input-sequence constraint of a typical Transformer model) to be quite effective. We analyze the distribution of relevant passages (inside documents) to explain this phenomenon. We further argue that, despite their widespread use, Robust04 and MS MARCO documents are not particularly useful for benchmarking of long-document models.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/20/2020

Longformer for MS MARCO Document Re-ranking Task

Two step document ranking, where the initial retrieval is done by a clas...
research
06/02/2021

Hi-Transformer: Hierarchical Interactive Transformer for Efficient and Effective Long Document Modeling

Transformer is important for text modeling. However, it has difficulty i...
research
02/04/2020

Interpretable Time-Budget-Constrained Contextualization for Re-Ranking

Search engines operate under a strict time constraint as a fast response...
research
01/14/2021

Analysis of E-commerce Ranking Signals via Signal Temporal Logic

The timed position of documents retrieved by learning to rank models can...
research
05/09/2022

Long Document Re-ranking with Modular Re-ranker

Long document re-ranking has been a challenging problem for neural re-ra...
research
09/02/2020

Identifying Documents In-Scope of a Collection from Web Archives

Web archive data usually contains high-quality documents that are very u...
research
05/11/2022

Query-Based Keyphrase Extraction from Long Documents

Transformer-based architectures in natural language processing force inp...

Please sign up or login with your details

Forgot password? Click here to reset