Describing Subjective Experiment Consistency by p-Value P-P Plot

09/28/2020
by   Jakub Nawała, et al.
0

There are phenomena that cannot be measured without subjective testing. However, subjective testing is a complex issue with many influencing factors. These interplay to yield either precise or incorrect results. Researchers require a tool to classify results of subjective experiment as either consistent or inconsistent. This is necessary in order to decide whether to treat the gathered scores as quality ground truth data. Knowing if subjective scores can be trusted is key to drawing valid conclusions and building functional tools based on those scores (e.g., algorithms assessing the perceived quality of multimedia materials). We provide a tool to classify subjective experiment (and all its results) as either consistent or inconsistent. Additionally, the tool identifies stimuli having irregular score distribution. The approach is based on treating subjective scores as a random variable coming from the discrete Generalized Score Distribution (GSD). The GSD, in combination with a bootstrapped G-test of goodness-of-fit, allows to construct p-value P-P plot that visualizes experiment's consistency. The tool safeguards researchers from using inconsistent subjective data. In this way, it makes sure that conclusions they draw and tools they build are more precise and trustworthy. The proposed approach works in line with expectations drawn solely on experiment design descriptions of 21 real-life multimedia quality subjective experiments.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/07/2023

Screen-based 3D Subjective Experiment Software

Recently, widespread 3D graphics (e.g., point clouds and meshes) have dr...
research
09/01/2022

Reproducibility Companion Paper: Describing Subjective Experiment Consistency by p-Value P-P Plot

In this paper we reproduce experimental results presented in our earlier...
research
09/20/2021

Inconsistency in Conference Peer Review: Revisiting the 2014 NeurIPS Experiment

In this paper we revisit the 2014 NeurIPS experiment that examined incon...
research
12/11/2022

Applicability limitations of differentiable full-reference image-quality

Subjective image-quality measurement plays a critical role in the develo...
research
09/10/2019

Generalized Score Distribution

A class of discrete probability distributions contains distributions wit...
research
02/19/2021

Subjective Assessments of Legibility in Ancient Manuscript Images – The SALAMI Dataset

The research field concerned with the digital restoration of degraded wr...

Please sign up or login with your details

Forgot password? Click here to reset