On the Reliability of Test Collections for Evaluating Systems of Different Types

04/28/2020
by   Emine Yilmaz, et al.
0

As deep learning based models are increasingly being used for information retrieval (IR), a major challenge is to ensure the availability of test collections for measuring their quality. Test collections are generated based on pooling results of various retrieval systems, but until recently this did not include deep learning systems. This raises a major challenge for reusable evaluation: Since deep learning based models use external resources (e.g. word embeddings) and advanced representations as opposed to traditional methods that are mainly based on lexical similarity, they may return different types of relevant document that were not identified in the original pooling. If so, test collections constructed using traditional methods are likely to lead to biased and unfair evaluation results for deep learning (neural) systems. This paper uses simulated pooling to test the fairness and reusability of test collections, showing that pooling based on traditional systems only can lead to biased evaluation of deep learning systems.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/14/2022

TripJudge: A Relevance Judgement Test Collection for TripClick Health Retrieval

Robust test collections are crucial for Information Retrieval research. ...
research
01/26/2022

Can Old TREC Collections Reliably Evaluate Modern Neural Retrieval Models?

Neural retrieval models are generally regarded as fundamentally differen...
research
02/11/2019

Towards an All-Purpose Content-Based Multimedia Information Retrieval System

The growth of multimedia collections - in terms of size, heterogeneity, ...
research
12/24/2020

Understanding and Predicting the Characteristics of Test Collections

Shared-task campaigns such as NIST TREC select documents to judge by poo...
research
12/11/2019

Lifelong learning for text retrieval and recognition in historical handwritten document collections

This chapter provides an overview of the problems that need to be dealt ...
research
09/06/2017

Active Sampling for Large-scale Information Retrieval Evaluation

Evaluation is crucial in Information Retrieval. The development of model...

Please sign up or login with your details

Forgot password? Click here to reset