Joint Upper Lower Bound Normalization for IR Evaluation

In this paper, we present a novel perspective towards IR evaluation by proposing a new family of evaluation metrics where the existing popular metrics (e.g., nDCG, MAP) are customized by introducing a query-specific lower-bound (LB) normalization term. While original nDCG, MAP etc. metrics are normalized in terms of their upper bounds based on an ideal ranked list, a corresponding LB normalization for them has not yet been studied. Specifically, we introduce two different variants of the proposed LB normalization, where the lower bound is estimated from a randomized ranking of the corresponding documents present in the evaluation set. We next conducted two case-studies by instantiating the new framework for two popular IR evaluation metric (with two variants, e.g., DCG_UL_V1,2 and MSP_UL_V1,2 ) and then comparing against the traditional metric without the proposed LB normalization. Experiments on two different data-sets with eight Learning-to-Rank (LETOR) methods demonstrate the following properties of the new LB normalized metric: 1) Statistically significant differences (between two methods) in terms of original metric no longer remain statistically significant in terms of Upper Lower (UL) Bound normalized version and vice-versa, especially for uninformative query-sets. 2) When compared against the original metric, our proposed UL normalized metrics demonstrate higher Discriminatory Power and better Consistency across different data-sets. These findings suggest that the IR community should consider UL normalization seriously when computing nDCG and MAP and more in-depth study of UL normalization for general IR evaluation is warranted.

READ FULL TEXT
research
07/07/2022

On the Metric Properties of IR Evaluation Measures Based on Ranking Axioms

The axiomatic analysis of IR evaluation metrics has contributed to a bet...
research
07/04/2022

On the Effect of Ranking Axioms on IR Evaluation Metrics

The study of IR evaluation metrics through axiomatic analysis enables a ...
research
10/12/2022

Bounds on the Wireless MapReduce NDT-Computation Tradeoff

We consider a full-duplex wireless Distributed Computing (DC) system und...
research
10/18/2022

Capacitated Vehicle Routing in Graphic Metrics

We study the capacitated vehicle routing problem in graphic metrics (gra...
research
02/15/2018

Black Hole Metric: Overcoming the PageRank Normalization Problem

In network science, there is often the need to sort the graph nodes. Whi...
research
01/05/2022

Atomized Search Length: Beyond User Models

We argue that current IR metrics, modeled on optimizing user experience,...
research
05/01/2023

A Blueprint of IR Evaluation Integrating Task and User Characteristics: Test Collection and Evaluation Metrics

Relevance is generally understood as a multi-level and multi-dimensional...

Please sign up or login with your details

Forgot password? Click here to reset