HC4: A New Suite of Test Collections for Ad Hoc CLIR

01/24/2022
by   Dawn Lawrie, et al.
0

HC4 is a new suite of test collections for ad hoc Cross-Language Information Retrieval (CLIR), with Common Crawl News documents in Chinese, Persian, and Russian, topics in English and in the document languages, and graded relevance judgments. New test collections are needed because existing CLIR test collections built using pooling of traditional CLIR runs have systematic gaps in their relevance judgments when used to evaluate neural CLIR methods. The HC4 collections contain 60 topics and about half a million documents for each of Chinese and Persian, and 54 topics and five million documents for Russian. Active learning was used to determine which documents to annotate after being seeded using interactive search and judgment. Documents were judged on a three-grade relevance scale. This paper describes the design and construction of the new test collections and provides baseline results for demonstrating their utility for evaluating systems.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/26/2022

Can Old TREC Collections Reliably Evaluate Modern Neural Retrieval Models?

Neural retrieval models are generally regarded as fundamentally differen...
research
04/24/2023

Overview of the TREC 2022 NeuCLIR Track

This is the first year of the TREC Neural CLIR (NeuCLIR) track, which ai...
research
09/06/2017

Active Sampling for Large-scale Information Retrieval Evaluation

Evaluation is crucial in Information Retrieval. The development of model...
research
01/17/2018

Efficient Test Collection Construction via Active Learning

To create a new IR test collection at minimal cost, we must carefully se...
research
07/12/2016

Natural brain-information interfaces: Recommending information by relevance inferred from human brain signals

Finding relevant information from large document collections such as the...
research
03/09/2022

ASET: Ad-hoc Structured Exploration of Text Collections [Extended Abstract]

In this paper, we propose a new system called ASET that allows users to ...
research
10/11/2022

Bi-Phase Enhanced IVFPQ for Time-Efficient Ad-hoc Retrieval

IVFPQ is a popular index paradigm for time-efficient ad-hoc retrieval. I...

Please sign up or login with your details

Forgot password? Click here to reset