Selecting applicants based on multiple ratings: Using binary classification framework as an alternative to inter-rater reliability
Inter-rater reliability (IRR) has been the prevalent quality and precision measure in ratings from multiple raters. However, applicant selection procedures based on ratings from multiple raters usually result in a binary outcome. This final outcome is not considered in IRR, which instead focuses on the ratings of the individual subjects or objects. In this work, we outline how to transform the selection procedures into a binary classification framework and develop a quantile approximation which connects a measurement model for the ratings with the binary classification framework. The quantile approximation allows us to estimate the probability of correctly selecting the best applicants and assess error probabilities when evaluating the quality of selection procedures using ratings from multiple raters. We draw connections between the inter-rater reliability and the binary classification metrics, showing that binary classification metrics depend solely on the IRR coefficient and proportion of selected applicants. We assess the performance of the quantile approximation in a simulation study and apply it in an example comparing the reliability of multiple grant peer review selection procedures.
READ FULL TEXT