A practical guide and software for analysing pairwise comparison experiments

by   Maria Perez-Ortiz, et al.
University of Cambridge

Most popular strategies to capture subjective judgments from humans involve the construction of a unidimensional relative measurement scale, representing order preferences or judgments about a set of objects or conditions. This information is generally captured by means of direct scoring, either in the form of a Likert or cardinal scale, or by comparative judgments in pairs or sets. In this sense, the use of pairwise comparisons is becoming increasingly popular because of the simplicity of this experimental procedure. However, this strategy requires non-trivial data analysis to aggregate the comparison ranks into a quality scale and analyse the results, in order to take full advantage of the collected data. This paper explains the process of translating pairwise comparison data into a measurement scale, discusses the benefits and limitations of such scaling methods and introduces a publicly available software in Matlab. We improve on existing scaling methods by introducing outlier analysis, providing methods for computing confidence intervals and statistical testing and introducing a prior, which reduces estimation error when the number of observers is low. Most of our examples focus on image quality assessment.


Data Analysis in Multimedia Quality Assessment: Revisiting the Statistical Tests

Assessment of multimedia quality relies heavily on subjective assessment...

A different perspective on a scale for pairwise comparisons

One of the major challenges for collective intelligence is inconsistency...

Rank-smoothed Pairwise Learning In Perceptual Quality Assessment

Conducting pairwise comparisons is a widely used approach in curating hu...

An Image Quality Assessment Dataset for Portraits

Year after year, the demand for ever-better smartphone photos continues ...

Strategy for Boosting Pair Comparison and Improving Quality Assessment Accuracy

The development of rigorous quality assessment model relies on the colle...

An Approach to Multiple Comparison Benchmark Evaluations that is Stable Under Manipulation of the Comparate Set

The measurement of progress using benchmarks evaluations is ubiquitous i...

Searching for a higher power in the human evaluation of MT

In MT evaluation, pairwise comparisons are conducted to identify the bet...

Please sign up or login with your details

Forgot password? Click here to reset