Impact of the Number of Votes on the Reliability and Validity of Subjective Speech Quality Assessment in the Crowdsourcing Approach

03/25/2020
by   Babak Naderi, et al.
0

The subjective quality of transmitted speech is traditionally assessed in a controlled laboratory environment according to ITU-T Rec. P.800. In turn, with crowdsourcing, crowdworkers participate in a subjective online experiment using their own listening device, and in their own working environment. Despite such less controllable conditions, the increased use of crowdsourcing micro-task platforms for quality assessment tasks has pushed a high demand for standardized methods, resulting in ITU-T Rec. P.808. This work investigates the impact of the number of judgments on the reliability and the validity of quality ratings collected through crowdsourcing-based speech quality assessments, as an input to ITU-T Rec. P.808 . Three crowdsourcing experiments on different platforms were conducted to evaluate the overall quality of three different speech datasets, using the Absolute Category Rating procedure. For each dataset, the Mean Opinion Scores (MOS) are calculated using differing numbers of crowdsourcing judgements. Then the results are compared to MOS values collected in a standard laboratory experiment, to assess the validity of crowdsourcing approach as a function of number of votes. In addition, the reliability of the average scores is analyzed by checking inter-rater reliability, gain in certainty, and the confidence of the MOS. The results provide a suggestion on the required number of votes per condition, and allow to model its impact on validity and reliability.

READ FULL TEXT
research
04/11/2020

Application of Just-Noticeable Difference in Quality as Environment Suitability Test for Crowdsourcing Speech Quality Assessment Task

Crowdsourcing micro-task platforms facilitate subjective media quality a...
research
05/17/2020

An Open source Implementation of ITU-T Recommendation P.808 with Validation

The ITU-T Recommendation P.808 provides a crowdsourcing approach for con...
research
04/09/2021

Speech Quality Assessment in Crowdsourcing: Comparison Category Rating Method

Traditionally, Quality of Experience (QoE) for a communication system is...
research
10/25/2020

A Crowdsourcing Extension of the ITU-T Recommendation P.835 with Validation

The quality of the speech communication systems, which include noise sup...
research
03/10/2019

Deep Robust Subjective Visual Property Prediction in Crowdsourcing

The problem of estimating subjective visual properties (SVP) of images (...
research
04/17/2021

Comparison of remote experiments using crowdsourcing and laboratory experiments on speech intelligibility

Many subjective experiments have been performed to develop objective spe...
research
11/05/2020

Challenges and strategies for running controlled crowdsourcing experiments

This paper reports on the challenges and lessons we learned while runnin...

Please sign up or login with your details

Forgot password? Click here to reset