Evaluating Artificial Systems for Pairwise Ranking Tasks Sensitive to Individual Differences

05/30/2019
by   Xing Liu, et al.
0

Owing to the advancement of deep learning, artificial systems are now rival to humans in several pattern recognition tasks, such as visual recognition of object categories. However, this is only the case with the tasks for which correct answers exist independent of human perception. There is another type of tasks for which what to predict is human perception itself, in which there are often individual differences. Then, there are no longer single "correct" answers to predict, which makes evaluation of artificial systems difficult. In this paper, focusing on pairwise ranking tasks sensitive to individual differences, we propose an evaluation method. Given a ranking result for multiple item pairs that is generated by an artificial system, our method quantifies the probability that the same ranking result will be generated by humans, and judges if it is distinguishable from human-generated results. We introduce a probabilistic model of human ranking behavior, and present an efficient computation method for the judgment. To estimate model parameters accurately from small-size samples, we present a method that uses confidence scores given by annotators for ranking each item pair. Taking as an example a task of ranking image pairs according to material attributes of objects, we demonstrate how the proposed method works.

READ FULL TEXT
research
10/12/2020

Diptychs of human and machine perceptions

We propose visual creations that put differences in algorithms and human...
research
06/21/2022

Developing a Ranking Problem Library (RPLIB) from a data-oriented perspective

We present an improved library for the ranking problem called RPLIB. RPL...
research
05/19/2020

Addressing Class-Imbalance Problem in Personalized Ranking

Pairwise ranking models have been widely used to address recommendation ...
research
04/29/2020

Training Curricula for Open Domain Answer Re-Ranking

In precision-oriented tasks like answer ranking, it is more important to...
research
07/25/2022

Revisiting AP Loss for Dense Object Detection: Adaptive Ranking Pair Selection

Average precision (AP) loss has recently shown promising performance on ...
research
12/05/2017

Human Perception of Performance

Humans are routinely asked to evaluate the performance of other individu...
research
11/13/2017

Overlaying Quantitative Measurement on Networks: An Evaluation of Three Positioning and Nine Visual Marker Techniques

We report results from an experiment on ranking visual markers and node ...

Please sign up or login with your details

Forgot password? Click here to reset