Moving Beyond Downstream Task Accuracy for Information Retrieval Benchmarking

12/02/2022
by   Keshav Santhanam, et al.
0

Neural information retrieval (IR) systems have progressed rapidly in recent years, in large part due to the release of publicly available benchmarking tasks. Unfortunately, some dimensions of this progress are illusory: the majority of the popular IR benchmarks today focus exclusively on downstream task accuracy and thus conceal the costs incurred by systems that trade away efficiency for quality. Latency, hardware cost, and other efficiency considerations are paramount to the deployment of IR systems in user-facing settings. We propose that IR benchmarks structure their evaluation methodology to include not only metrics of accuracy, but also efficiency considerations such as a query latency and the corresponding cost budget for a reproducible hardware setting. For the popular IR benchmarks MS MARCO and XOR-TyDi, we show how the best choice of IR system varies according to how these efficiency considerations are chosen and weighed. We hope that future benchmarks will adopt these guidelines toward more holistic IR evaluation.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/10/2019

Neural-IR-Explorer: A Content-Focused Tool to Explore Neural Re-Ranking Results

In this paper we look beyond metrics-based evaluation of Information Ret...
research
07/10/2019

Let's measure run time! Extending the IR replicability infrastructure to include performance aspects

Establishing a docker-based replicability infrastructure offers the comm...
research
06/21/2023

Resources and Evaluations for Multi-Distribution Dense Information Retrieval

We introduce and define the novel problem of multi-distribution informat...
research
06/05/2023

Gen-IR @ SIGIR 2023: The First Workshop on Generative Information Retrieval

Generative information retrieval (IR) has experienced substantial growth...
research
01/12/2023

Taking Search to Task

The importance of tasks in information retrieval (IR) has been long argu...
research
09/07/2018

Challenges for Measuring Usefulness of Interactive IR Systems with Log-based Approaches

The usefulness evaluation model proposed by Cole et al. in 2009 [2] focu...
research
09/17/2019

Revealing the Importance of Semantic Retrieval for Machine Reading at Scale

Machine Reading at Scale (MRS) is a challenging task in which a system i...

Please sign up or login with your details

Forgot password? Click here to reset