Mix and Match: Collaborative Expert-Crowd Judging for Building Test Collections Accurately and Affordably

06/03/2018
by   Mucahid Kutlu, et al.
0

Crowdsourcing offers an affordable and scalable means to collect relevance judgments for IR test collections. However, crowd assessors may show higher variance in judgment quality than trusted assessors. In this paper, we investigate how to effectively utilize both groups of assessors in partnership. We specifically investigate how agreement in judging is correlated with three factors: relevance category, document rankings, and topical variance. Based on this, we then propose two collaborative judging methods in which a portion of the document-topic pairs are assessed by in-house judges while the rest are assessed by crowd-workers. Experiments conducted on two TREC collections show encouraging results when we distribute work intelligently between our two groups of assessors.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/24/2020

Understanding and Predicting the Characteristics of Test Collections

Shared-task campaigns such as NIST TREC select documents to judge by poo...
research
04/21/2023

Hear Me Out: A Study on the Use of the Voice Modality for Crowdsourced Relevance Assessments

The creation of relevance assessments by human assessors (often nowadays...
research
03/27/2019

Graded Relevance Assessments and Graded Relevance Measures of NTCIR: A Survey of the First Twenty Years

NTCIR was the first large-scale IR evaluation conference to construct te...
research
11/02/2022

Relevance Assessments for Web Search Evaluation: Should We Randomise or Prioritise the Pooled Documents? (CORRECTED VERSION)

In the context of depth-k pooling for constructing web search test colle...
research
08/21/2020

Investigating Differences in Crowdsourced News Credibility Assessment: Raters, Tasks, and Expert Criteria

Misinformation about critical issues such as climate change and vaccine ...
research
02/13/2022

Learning to Rank from Relevance Judgments Distributions

Learning to Rank (LETOR) algorithms are usually trained on annotated cor...
research
09/27/2018

Consistency and Variation in Kernel Neural Ranking Model

This paper studies the consistency of the kernel-based neural ranking mo...

Please sign up or login with your details

Forgot password? Click here to reset