The FairCeptron: A Framework for Measuring Human Perceptions of Algorithmic Fairness

02/08/2021
by   Georg Ahnert, et al.
RWTH Aachen University
0

Measures of algorithmic fairness often do not account for human perceptions of fairness that can substantially vary between different sociodemographics and stakeholders. The FairCeptron framework is an approach for studying perceptions of fairness in algorithmic decision making such as in ranking or classification. It supports (i) studying human perceptions of fairness and (ii) comparing these human perceptions with measures of algorithmic fairness. The framework includes fairness scenario generation, fairness perception elicitation and fairness perception analysis. We demonstrate the FairCeptron framework by applying it to a hypothetical university admission context where we collect human perceptions of fairness in the presence of minorities. An implementation of the FairCeptron framework is openly available, and it can easily be adapted to study perceptions of algorithmic fairness in other application contexts. We hope our work paves the way towards elevating the role of studies of human fairness perceptions in the process of designing algorithmic decision making systems.

READ FULL TEXT VIEW PDF

page 1

page 2

page 3

03/22/2021

Fairness Perceptions of Algorithmic Decision-Making: A Systematic Review of the Empirical Literature

Algorithmic decision-making (ADM) increasingly shapes people's daily liv...
09/29/2021

Understanding Relations Between Perception of Fairness and Trust in Algorithmic Decision Making

Algorithmic processes are increasingly employed to perform managerial de...
01/15/2019

Fair and Unbiased Algorithmic Decision Making: Current State and Future Challenges

Machine learning algorithms are now frequently used in sensitive context...
07/09/2021

Escaping the "Impossibility of Fairness": From Formal to Substantive Algorithmic Fairness

In the face of compounding crises of social and economic inequality, man...
05/21/2020

Principal Fairness for Human and Algorithmic Decision-Making

Using the concept of principal stratification from the causal inference ...
10/07/2020

Fairness Perception from a Network-Centric Perspective

Algorithmic fairness is a major concern in recent years as the influence...
01/22/2020

Fairness Metrics: A Comparative Analysis

Algorithmic fairness is receiving significant attention in the academic ...

Motivation

Considering fairness in algorithmic decision-making poses an important challenge chouldechova2020snapshot. Different definitions of algorithmic fairness have been proposed, including individual measures dwork2012fairness, as well as group based measures for both classification friedler2019comparative and ranking decisions yang2017measuring. In general, algorithms trade accuracy and fairness kearns2019ethical, and group-based fairness measures cannot be simultaneously equalized over all groups chouldechova2017prediction. Thus, normative decisions must be made.

One way of approaching these decisions is through an analysis of what is perceived as fair, involving the target population of a deciding algorithm in its creation. This could increase the acceptance of algorithmic decision making awad2018moral. Involvement also benefits procedural fairness, often the most important contributor to overall fairness perception ambrose2015overall. Previous research investigated perceptions of algorithmic fairness saxena2019fairness; srivastava2019mathematical; harrison2020empirical, but focused on classification and predominantly optimal decisions.

Psychological research suggests that fairness perceptions are influenced by social context engstrom2020justification. It was found that fairness perception differs between genders dulebohn2016gender, cultures blake2015ontogeny, and people of different personality traits truxillo2006field; wiesenfeld2007more. These differences are currently not accounted for in fairness measures commonly used in computer science.

In this paper we present the FairCeptron framework for studying fairness perceptions. It allows to study classification and ranking decisions that do not necessarily optimize for a single fairness measure. With the Fair

Ceptron, obligatory trade-offs between accuracy and multiple fairness measures can be investigated, and the nature of the relationships between fairness perceptions and fairness measures can be determined. An implementation is available as open source

11footnotemark: 1 and built for easy deployment and adaptation to different study contexts.

Figure 1: (A) A FairCeptron ranking scenario. Participants are shown an algorithmic ranking scenario. They rate perceived fairness of the scenario on a visual analogue scale. In addition to ranking, classification scenarios are also supported.
(B) Perceptions of fairness across different ranking scenarios. All scenarios are binned by ordering utility zehlike2017fa and gender representation (adapted from yang2017measuring yang2017measuring). Participants were mainly influenced by ordering utility. Higher ratings for over-representation of women vs. men can be seen in scenarios with ordering utility .

The FairCeptron Framework

The FairCeptron framework consist of three components: (i) the generation of fairness scenarios according to a prespecified algorithm, (ii) presentation of scenarios to survey participants and collecting their subjective fairness rating, and (iii) analysis of responses that takes into account characteristics of scenarios, e.g. group sizes, and characteristics of users, e.g. sociodemographics or attitudes. The FairCeptron framework can be implemented in various ways, in this paper we present one particular implementation.

Fairness scenario generation

Algorithmic ranking and classification scenarios are generated that consist of personas of two or more groups that can optionally have a second, numeric attribute associated to them. We provide simple code examples for scenario generation in Python. The scenarios are generated as all possible selections from / permutations of personas, in which personas within a group are selected / ranked by qualification. The scenarios are clustered along multiple measures of algorithmic fairness, ensuring that each participant later receives a variety of scenarios, while maximizing the total number of scenarios that are tested.

Fairness perception elicitation

Participants take part through a responsive, universal web application as shown in Fig. 1 (A). For each new participant, the application selects one random fairness scenario from each pre-defined cluster of scenarios, and then shuffles the selected scenarios. For every scenario, a description and an illustration is shown. The participants rate each scenario on an initially blank visual analogue scale (VAS) from very unfair to very fair. A dynamic indicator is added to the VAS to improve accuracy with minimal additional bias matejka2016effect. The time to answer, and the uncertainty in answering, measured as the sum of differences of non-final ratings, are stored alongside the final answer. Sociodemographics and attitudes can also be elicited.

Fairness perception analysis

The obtained data can be exported from MongoDB in CSV or JSON format. We provide evaluation examples written with common Python frameworks for the above listed analyses. Heatmaps that compare fairness ratings on scenarios group by two distinct measures can easily be generated, as shown in Fig. 1 (B).

Demonstration

For demonstration purposes, we applied the FairCeptron framework using a voluntary response sample of 136 people. The hypothetical scenarios concern a university admission process. All scenarios displayed 10 female / male student applicants with associated qualification scores. Each participant was asked to rate 10 classification and 10 ranking scenarios. Additionally, participants filled in additional questions about their demographics, their attitudes towards deciding machines (adapted from awad2018moral awad2018moral), and took a big-five personality short test rammstedt2007measuring.

Fig. 1 (B) illustrates the fairness perceptions aggregated from the ranking scenarios of the FairCeptron study. In general, participants rated scenarios according to their ordering utility. The highlighted exemplary bin is rated unfair on average, with scenarios that partially violate qualification order and in which men are over-represented. Ratings differ by participant gender and political orientation, in particular the acceptance of over-representing female personas. These findings only serve for illustration and are obtained from a non-representative population. The demo at ICWSM will include a walk-through over scenario generation, perception elicitation, and analysis.

FairCeptron studies can easily be deployed with little efforts building upon the existing implementation. The framework allows to investigate whether fairness perceptions depend on domains (e.g. education, medicine, finance), sociodemographics (e.g. gender, occupation) or the stakes involved (high- vs low-stakes decisions). The results obtained from FairCeptron studies could empirically inform the selection and evaluation of fairness measures in real world settings. We hope our framework represents a stepping stone towards a future, in which the people subjected to algorithmic decision making are contributing in its design process, and in which algorithmic notions of fairness are subjected to empirical studies of human perceptions of fairness before implementation and roll-out.

In summary, we present a framework for studying perceptions of fairness in algorithmic decision making such as in ranking or classification that includes fairness scenario generation, fairness elicitation and fairness perception analysis steps. Our implementation of the framework is available on GitHub as open source.

References