mRobust04: A Multilingual Version of the TREC Robust 2004 Benchmark

09/27/2022
by   Vitor Jeronymo, et al.
0

Robust 2004 is an information retrieval benchmark whose large number of judgments per query make it a reliable evaluation dataset. In this paper, we present mRobust04, a multilingual version of Robust04 that was translated to 8 languages using Google Translate. We also provide results of three different multilingual retrievers on this dataset. The dataset is available at https://huggingface.co/datasets/unicamp-dl/mrobust

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/31/2021

mMARCO: A Multilingual Version of MS MARCO Passage Ranking Dataset

The MS MARCO ranking dataset has been widely used for training deep lear...
research
09/28/2022

Multilingual Search with Subword TF-IDF

Multilingual search can be achieved with subword tokenization. The accur...
research
02/22/2022

A New Generation of Perspective API: Efficient Multilingual Character-level Transformers

On the world wide web, toxic content detectors are a crucial line of def...
research
09/27/2021

MFAQ: a Multilingual FAQ Dataset

In this paper, we present the first multilingual FAQ dataset publicly av...
research
09/07/2020

Why Not Simply Translate? A First Swedish Evaluation Benchmark for Semantic Similarity

This paper presents the first Swedish evaluation benchmark for textual s...
research
01/28/2023

Bipol: Multi-axes Evaluation of Bias with Explainability in Benchmark Datasets

We evaluate five English NLP benchmark datasets (available on the superG...
research
07/30/2021

MTVR: Multilingual Moment Retrieval in Videos

We introduce mTVR, a large-scale multilingual video moment retrieval dat...

Please sign up or login with your details

Forgot password? Click here to reset