Efficient Test Collection Construction via Active Learning

01/17/2018
by   Md Mustafizur Rahman, et al.
0

To create a new IR test collection at minimal cost, we must carefully select which documents merit human relevance judgments. Shared task campaigns such as NIST TREC determine this by pooling search results from many participating systems (and often interactive runs as well), thereby identifying the most likely relevant documents in a given collection. While effective, it would be preferable to be able to build a new test collection without needing to run an entire shared task. Toward this end, we investigate multiple active learning (AL) strategies which, without reliance on system rankings: 1) select which documents human assessors should judge; and 2) automatically classify the relevance of remaining unjudged documents. Because scarcity of relevant documents tends to yield highly imbalanced training data for model estimation, we investigate sampling strategies to mitigate class imbalance. We report experiments on four TREC collections with varying scarcity of relevant documents, reporting labeling accuracy achieved, as well as rank correlation when evaluating participant systems using these labels vs. NIST judgments. Results demonstrate the effectiveness of our approach, coupled with further analysis showing how varying relevance scarcity, within and across collections, impacts findings.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/24/2020

Understanding and Predicting the Characteristics of Test Collections

Shared-task campaigns such as NIST TREC select documents to judge by poo...
research
01/24/2022

HC4: A New Suite of Test Collections for Ad Hoc CLIR

HC4 is a new suite of test collections for ad hoc Cross-Language Informa...
research
01/26/2022

Can Old TREC Collections Reliably Evaluate Modern Neural Retrieval Models?

Neural retrieval models are generally regarded as fundamentally differen...
research
09/03/2015

Incremental Active Opinion Learning Over a Stream of Opinionated Documents

Applications that learn from opinionated documents, like tweets or produ...
research
03/27/2019

Graded Relevance Assessments and Graded Relevance Measures of NTCIR: A Survey of the First Twenty Years

NTCIR was the first large-scale IR evaluation conference to construct te...
research
04/23/2023

Query-specific Variable Depth Pooling via Query Performance Prediction towards Reducing Relevance Assessment Effort

Due to the massive size of test collections, a standard practice in IR e...
research
01/31/2018

ILPS at TREC 2017 Common Core Track

The TREC 2017 Common Core Track aimed at gathering a diverse set of part...

Please sign up or login with your details

Forgot password? Click here to reset