Hear Me Out: A Study on the Use of the Voice Modality for Crowdsourced Relevance Assessments

04/21/2023
by   Nirmal Roy, et al.
0

The creation of relevance assessments by human assessors (often nowadays crowdworkers) is a vital step when building IR test collections. Prior works have investigated assessor quality behaviour, though into the impact of a document's presentation modality on assessor efficiency and effectiveness. Given the rise of voice-based interfaces, we investigate whether it is feasible for assessors to judge the relevance of text documents via a voice-based interface. We ran a user study (n = 49) on a crowdsourcing platform where participants judged the relevance of short and long documents sampled from the TREC Deep Learning corpus-presented to them either in the text or voice modality. We found that: (i) participants are equally accurate in their judgements across both the text and voice modality; (ii) with increased document length it takes participants significantly longer (for documents of length > 120 words it takes almost twice as much time) to make relevance judgements in the voice condition; and (iii) the ability of assessors to ignore stimuli that are not relevant (i.e., inhibition) impacts the assessment quality in the voice modality-assessors with higher inhibition are significantly more accurate than those with lower inhibition. Our results indicate that we can reliably leverage the voice modality as a means to effectively collect relevance labels from crowdworkers.

READ FULL TEXT
research
03/03/2022

Do Perceived Gender Biases in Retrieval Results Affect Relevance Judgements?

This work investigates the effect of gender-stereotypical biases in the ...
research
12/05/2020

Using voice note-taking to promote learners' conceptual understanding

Though recent technological advances have enabled note-taking through di...
research
06/03/2018

Mix and Match: Collaborative Expert-Crowd Judging for Building Test Collections Accurately and Affordably

Crowdsourcing offers an affordable and scalable means to collect relevan...
research
12/24/2020

Understanding and Predicting the Characteristics of Test Collections

Shared-task campaigns such as NIST TREC select documents to judge by poo...
research
07/12/2016

Natural brain-information interfaces: Recommending information by relevance inferred from human brain signals

Finding relevant information from large document collections such as the...
research
04/18/2021

Transitivity, Time Consumption, and Quality of Preference Judgments in Crowdsourcing

Preference judgments have been demonstrated as a better alternative to g...
research
04/02/2021

GAVIN: Gaze-Assisted Voice-Based Implicit Note-taking

Annotation is an effective reading strategy people often undertake while...

Please sign up or login with your details

Forgot password? Click here to reset