The Information Retrieval Experiment Platform

05/30/2023
by   Maik Fröbe, et al.
0

We integrate ir_datasets, ir_measures, and PyTerrier with TIRA in the Information Retrieval Experiment Platform (TIREx) to promote more standardized, reproducible, scalable, and even blinded retrieval experiments. Standardization is achieved when a retrieval approach implements PyTerrier's interfaces and the input and output of an experiment are compatible with ir_datasets and ir_measures. However, none of this is a must for reproducibility and scalability, as TIRA can run any dockerized software locally or remotely in a cloud-native execution environment. Version control and caching ensure efficient (re)execution. TIRA allows for blind evaluation when an experiment runs on a remote server or cloud not under the control of the experimenter. The test data and ground truth are then hidden from public access, and the retrieval software has to process them in a sandbox that prevents data leaks. We currently host an instance of TIREx with 15 corpora (1.9 billion documents) on which 32 shared retrieval tasks are based. Using Docker images of 50 standard retrieval approaches, we automatically evaluated all approaches on all tasks (50 · 32 = 1,600 runs) in less than a week on a midsize cluster (1,620 CPU cores and 24 GPUs). This instance of TIREx is open for submissions and will be integrated with the IR Anthology, as well as released open source.

READ FULL TEXT
research
07/01/2018

Information Retrieval in the Cloud

There has been a recent trend to migrate IT infrastructure into the clou...
research
09/29/2022

Multi-stage Information Retrieval for Vietnamese Legal Texts

This study deals with the problem of information retrieval (IR) for Viet...
research
08/28/2018

MIaS: Math-Aware Retrieval in Digital Mathematical Libraries

Digital mathematical libraries (DMLs) such as arXiv, Numdam, and EuDML c...
research
08/21/2023

Evaluating Temporal Persistence Using Replicability Measures

In real-world Information Retrieval (IR) experiments, the Evaluation Env...
research
08/22/2023

Large-scale information retrieval in software engineering – an experience report from industrial application

Software Engineering activities are information intensive. Research prop...
research
01/19/2023

New Metrics to Encourage Innovation and Diversity in Information Retrieval Approaches

In evaluation campaigns, participants often explore variations of popula...
research
08/19/2022

Real and simulated CBM data interacting with an ESCAPE datalake

Integration of the ESCAPE and CBM software environment. The ESCAPE datal...

Please sign up or login with your details

Forgot password? Click here to reset