Hear Me Out: A Study on the Use of the Voice Modality for Crowdsourced Relevance Assessments

by   Nirmal Roy, et al.

The creation of relevance assessments by human assessors (often nowadays crowdworkers) is a vital step when building IR test collections. Prior works have investigated assessor quality behaviour, though into the impact of a document's presentation modality on assessor efficiency and effectiveness. Given the rise of voice-based interfaces, we investigate whether it is feasible for assessors to judge the relevance of text documents via a voice-based interface. We ran a user study (n = 49) on a crowdsourcing platform where participants judged the relevance of short and long documents sampled from the TREC Deep Learning corpus-presented to them either in the text or voice modality. We found that: (i) participants are equally accurate in their judgements across both the text and voice modality; (ii) with increased document length it takes participants significantly longer (for documents of length > 120 words it takes almost twice as much time) to make relevance judgements in the voice condition; and (iii) the ability of assessors to ignore stimuli that are not relevant (i.e., inhibition) impacts the assessment quality in the voice modality-assessors with higher inhibition are significantly more accurate than those with lower inhibition. Our results indicate that we can reliably leverage the voice modality as a means to effectively collect relevance labels from crowdworkers.


Do Perceived Gender Biases in Retrieval Results Affect Relevance Judgements?

This work investigates the effect of gender-stereotypical biases in the ...

Using voice note-taking to promote learners' conceptual understanding

Though recent technological advances have enabled note-taking through di...

Mix and Match: Collaborative Expert-Crowd Judging for Building Test Collections Accurately and Affordably

Crowdsourcing offers an affordable and scalable means to collect relevan...

Understanding and Predicting the Characteristics of Test Collections

Shared-task campaigns such as NIST TREC select documents to judge by poo...

Natural brain-information interfaces: Recommending information by relevance inferred from human brain signals

Finding relevant information from large document collections such as the...

Transitivity, Time Consumption, and Quality of Preference Judgments in Crowdsourcing

Preference judgments have been demonstrated as a better alternative to g...

GAVIN: Gaze-Assisted Voice-Based Implicit Note-taking

Annotation is an effective reading strategy people often undertake while...

Please sign up or login with your details

Forgot password? Click here to reset