Measuring agreement among several raters classifying subjects into one-or-more (hierarchical) nominal categories. A generalisation of Fleiss' kappa

03/22/2023
by   Filip Moons, et al.
0

Cohen's and Fleiss' kappa are well-known measures for inter-rater reliability. However, they only allow a rater to select exactly one category for each subject. This is a severe limitation in some research contexts: for example, measuring the inter-rater reliability of a group of psychiatrists diagnosing patients into multiple disorders is impossible with these measures. This paper proposes a generalisation of the Fleiss' kappa coefficient that lifts this limitation. Specifically, the proposed κ statistic measures inter-rater reliability between multiple raters classifying subjects into one-or-more nominal categories. These categories can be weighted according to their importance, and the measure can take into account the category hierarchy (e.g., categories consisting of subcategories that are only available when choosing the main category like a primary psychiatric disorder and sub-disorders; but much more complex dependencies between categories are possible as well). The proposed κ statistic can handle missing data and a varying number of raters for subjects or categories. The paper briefly overviews existing methods allowing raters to classify subjects into multiple categories. Next, we derive our proposed measure step-by-step and prove that the proposed measure equals Fleiss' kappa when a fixed number of raters chose one category for each subject. The measure was developed to investigate the reliability of a new mathematics assessment method, of which an example is elaborated. The paper concludes with the worked-out example of psychiatrists diagnosing patients into multiple disorders.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/12/2019

Multi-rater delta: extending the delta nominal measure of agreement between two raters to many raters

The need to measure the degree of agreement among R raters who independe...
research
07/26/2021

Journal subject classification: intra- and inter-system discrepancies in Web Of Science and Scopus

Journal classification into subject categories is an important aspect in...
research
09/02/2022

Classifying with Uncertain Data Envelopment Analysis

Classifications organize entities into categories that identify similari...
research
07/16/2021

Measuring and Explaining the Inter-Cluster Reliability of Multidimensional Projections

We propose Steadiness and Cohesiveness, two novel metrics to measure the...
research
04/12/2023

Filler Word Detection with Hard Category Mining and Inter-Category Focal Loss

Filler words like “um" or “uh" are common in spontaneous speech. It is d...
research
08/29/2023

Reliability Gaps Between Groups in COMPAS Dataset

This paper investigates the inter-rater reliability of risk assessment i...
research
05/17/2021

What makes you unique?

This paper proposes a uniqueness Shapley measure to compare the extent t...

Please sign up or login with your details

Forgot password? Click here to reset