# Eliciting and Enforcing Subjective Individual Fairness

We revisit the notion of individual fairness first proposed by Dwork et al. [2012], which asks that "similar individuals should be treated similarly". A primary difficulty with this definition is that it assumes a completely specified fairness metric for the task at hand. In contrast, we consider a framework for fairness elicitation, in which fairness is indirectly specified only via a sample of pairs of individuals who should be treated (approximately) equally on the task. We make no assumption that these pairs are consistent with any metric. We provide a provably convergent oracle-efficient algorithm for minimizing error subject to the fairness constraints, and prove generalization theorems for both accuracy and fairness. Since the constrained pairs could be elicited either from a panel of judges, or from particular individuals, our framework provides a means for algorithmically enforcing subjective notions of fairness. We report on preliminary findings of a behavioral study of subjective fairness using human-subject fairness constraints elicited on the COMPAS criminal recidivism dataset.

## Authors

• 11 publications
• 27 publications
• 13 publications
• 42 publications
• 4 publications
• 49 publications
• ### Probably Approximately Metric-Fair Learning

We study fairness in machine learning. A learning algorithm, given a tra...
03/08/2018 ∙ by Guy N. Rothblum, et al. ∙ 0

• ### Metric Learning for Individual Fairness

There has been much discussion recently about how fairness should be mea...
06/01/2019 ∙ by Christina Ilvento, et al. ∙ 0

• ### Operationalizing Individual Fairness with Pairwise Fair Representations

We revisit the notion of individual fairness proposed by Dwork et al. A ...
07/02/2019 ∙ by Preethi Lahoti, et al. ∙ 1

• ### Fairness Through Computationally-Bounded Awareness

We study the problem of fair classification within the versatile framewo...
03/08/2018 ∙ by Michael P. Kim, et al. ∙ 0

• ### Metric-Free Individual Fairness in Online Learning

We study an online learning problem subject to the constraint of individ...
02/13/2020 ∙ by Yahav Bechavod, et al. ∙ 0

• ### Average Individual Fairness: Algorithms, Generalization and Experiments

We propose a new family of fairness definitions for classification probl...
05/25/2019 ∙ by Michael Kearns, et al. ∙ 0

• ### Agree to Disagree: Subjective Fairness in Privacy-Restricted Decentralised Conflict Resolution

Fairness is commonly seen as a property of the global outcome of a syste...
06/30/2021 ∙ by Alex Raymond, et al. ∙ 0

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

Individual Fairness for algorithmic decision making was originally formulated as the compelling idea that “similar individuals should be treated similarly” by Dwork et al. (2012). In its original formulation, “similarity” was determined by a task-specific metric on individuals, which would be provided to the algorithm designer. Since then, the formulation of this task-specific fairness metric has been the primary obstacle that has stood in the way of adoption and further development of this conception of individual fairness. This is for two important reasons:

1. First, although people might have strong intuitions about what kinds of decisions are unfair, it is difficult for them to distill these intuitions into a concisely defined quantitative measure.

2. Second, different people disagree on what constitutes “fairness”. There is no reason to suspect that even if particular individuals were able to distill their intuitive notions of fairness into some quantitative measure, that those measures would be consistent with one another, or even internally consistent.

In this work, we propose a practical but rigorous approach aimed at circumventing this difficulty, while staying close to the original idea that “similar individuals should be treated similarly”. We are motivated by the following idea: Even if people cannot distill their conception of fairness as a quantitative metric, they can still be asked to express their opinion about whether particular pairs of individuals should be treated similarly or not. Thus, one could choose a panel of “judges”, or even a particular person, and elicit opinions from them about whether certain pairs of decisions were fair or not. There is no reason to suspect that these pairwise opinions will be consistent in any sense, or that they will form a metric. Nevertheless, once such a set of pairwise fairness constraints has been elicited, and once a data distribution and hypothesis class are fixed, there is a well-defined learning problem: minimize classification error subject to the constraint that the violation of the specified pairs is held below some fixed threshold. By varying this threshold, we can in principle define a Pareto frontier of classifiers, optimally trading off error with the elicited conception of individual fairness — without ever having to commit to a restricted class of fairness notions. We would like to find the classifiers that realize this Pareto frontier. In this paper, we solve the computational, statistical, and conceptual issues necessary to do this, and demonstrate the effectiveness of our approach via a behavioral study.

### 1.1 Results

#### Our Model

We model individuals as having features in and binary labels, drawn from some distribution . A committee of judges111Though we develop our formalism as a committee of judges, note that it permits the special case of a single subjective judge, which we make use of in our behavioral study.

has preferences that certain individuals should be treated the same way by a classifier — i.e. that the probability that they are given a positive label should be the same. We represent these preferences abstractly as a set of pairs

for each judge , where represents that judge would view it as unfair if individuals and were treated substantially differently (i.e. given a positive classification with a substantially different probability). We impose no structure on how judges form their views, or the relationship between the views of different judges — i.e. the sets are allowed to be arbitrary (for example, they need not satisfy a triangle inequality), and need not be mutually consistent. We write .

We then formulate a constrained optimization problem, that has two different “knobs” with which we can quantitatively relax our fairness constraint. Suppose that we say that a -fairness violation corresponds to classifying a pair of individuals such that their probabilities of receiving a positive label differ by more than (our first knob): . In this expression, the expectation is taken only over the randomness of the classifier . We might ask that for no pair of individuals do we have a -fairness violation: . On the other hand, we could ask for the weaker constraint that over a random draw of a pair of individuals, the expected fairness violation is at most (our second knob): . We can also combine both relaxations to ask that the in expectation over random pairs, the “excess” fairness violation, on top of an allowed budget of , is at most . Subject to these constraints, we would like to find the distribution over classifiers that minimizes classification error: given a setting of the parameters and , this defines a benchmark with which we would like to compete.

#### Our Theoretical Results

Even absent fairness constraints, learning to minimize 0/1 loss (even over linear classifiers) is computationally hard in the worst case (see e.g. Feldman et al. (2012, 2009)). Despite this, learning seems to be empirically tractable in most cases. To capture the additional hardness of learning subject to fairness constraints, we follow several recent papers Agarwal et al. (2018); Kearns et al. (2018) in aiming to develop oracle efficient learning algorithms. Oracle efficient algorithms are assumed to have access to an oracle

(realized in experiments using a heuristic — see the next section) that can solve weighted classification problems. Given access to such an oracle, oracle efficient algorithms must run in polynomial time. We show that our fairness constrained learning problem is computationally no harder than unconstrained learning by giving such an oracle efficient algorithm (or reduction), and show moreover that its guarantees generalize from in-sample to out-of-sample in the usual way — with respect to both accuracy and the frequency and magnitude of fairness violations. Our algorithm is simple and amenable to implementation, and we use it in our experimental results.

#### Our Experimental Results

Finally, we implement our algorithm and run a set of experiments on the COMPAS recidivism prediction dataset, using fairness constraints elicited from 43 human subjects. We establish that our algorithm converges quickly (even when implemented with fast learning heuristics, rather than “oracles”). We also explore the Pareto curves trading off error and fairness violations for different human judges, and find empirically that there is a great deal of variability across subjects in terms of their conception of fairness, and in terms of the degree to which their expressed preferences are in conflict with accurate prediction. Finally we find that most of the difficulty in balancing accuracy with the elicited fairness constraints can be attributed to a small fraction of the reported constraints.

### 1.2 Related work

Dwork et al. (2012) first proposed the notion of individual metric-fairness that we take inspiration from, imagining fairness as a Lipschitz constraint on a randomized algorithm, with respect to some “task-specific metric” to be provided to the algorithm designer. Since the original proposal, the question of where the fairness metric should come from has been one of the primary obstacles to its adoption, and the focus of subsequent work. Zemel et al. (2013) attempt to automatically learn a representation for the data (and hence, implicitly, a similarity metric) that causes a classifier to label an equal proportion of two protected groups as positive. They provide a heuristic approach and an experimental evaluation. Kim et al. (2018) consider a group-fairness like relaxation of individual metric-fairness, asking that on average, individuals in pre-specified groups are classified with probabilities proportional to the average distance between individuals in those groups. They show how to learn such classifiers given access to an oracle which can evaluate the distance between two individuals according to the metric. Compared to our work, they assume the existence of an exact fairness metric which can be accessed using a quantitative oracle, and they use this metric to define a statistical rather than individual notion of fairness. Most related to our work, Gillen et al. (2018) assumes access to an oracle which simply identifies fairness violations across pairs of individuals. Under the assumption that the oracle is exactly consistent with a metric in a simple linear class, Gillen et al. (2018) gives a polynomial time algorithm to compete with the best fair policy in an online linear contextual bandits problem. In contrast to the unrealistic assumptions that Gillen et al. (2018) is forced to make in order to derive a polynomial time algorithm (consistency with a simple class of metrics), we make essentially no assumptions at all on the structure of the “fairness” constraints. Ilvento (2019) studies the problem of metric learning with the goal of using only a small number of numeric valued queries, which are hard for human beings to answer, relying more on comparison queries. Finally, Rothblum and Yona (2018) prove similar generalization guarantees to ours in the context of individual-metric fairness. In the setting that they consider, the metric fairness constraint is given.

## 2 Problem formulation

Let denote a set of labeled examples where

is a feature vector and

is a label. We will also write and . Throughout the paper, we will restrict attention to binary labels, so let . Let denote the unknown distribution over . Let denote a hypothesis class containing binary classifiers . We assume that contains a constant classifier (which will imply that the “fairness constrained” ERM problem that we define is always feasible). We’ll denote classification error of hypothesis by and its empirical classification error by .

We assume there is a set of one or more judges , such that each judge is identified with a set of pairs of individuals that she thinks should be “treated similarly” i.e. ideally that for the learned classifier , (we will ask that this hold in expectation if the classifier is randomized, and will relax it in various ways). For each pair , let be the fraction of judges who would like individual and to be treated similarly – that is . Note that .

In practice, we will not have direct access to the sets of pairs corresponding to the judges , but we may ask them whether particular pairs are in this set (see Section 5 for details about how we actually query human subjects). We model this by imagining that we present each judge with a random set of pairs 222We will always assume that this pair set is closed under symmetry, and for each pair , ask if the pair should be treated similarly or not; we learn the set of pairs in for each . Define the empirical constraint set and , if and 0 otherwise. For simplicity, we will sometimes write instead of . Note that for every .

Our goal will be to find the distribution over classifiers from that minimizes classification error, while satisfying the judges’ fairness requirement . To do so, we’ll try to find

, a probability distribution over

, that minimizes the training error and satisfy the judges’ empirical fairness constraints, . For convenience, we denote ’s expected classification error as and likewise its expected empirical classification error as . We say that any distribution over classifiers satisfies -approximate subjective fairness if it is a feasible solution to the following constrained empirical risk minimization problem:

 minD∈ΔH,αij≥0 err(D,S) (1) such that ∀(i,j)∈[n]2: Eh∼D[h(xi)−h(xj)]≤αij+γ (2) ∑(i,j)∈[n]2^wijαij|A|≤η. (3)

This “Fair ERM” problem, whose feasible region we denote by , has decision variables and , representing the distribution over classifiers and the “fairness violation” terms for each pair of training points, respectively. The parameters and are constants which represent the two different “knobs” we have at our disposal to quantitatively relax the fairness constraint, in an and sense respectively. To understand each of them, it helps to consider them in isolation. First, imagine that we set . controls the worst-case disparity between the probability that any pair is classified as positive. (Note that although we have constraints for every pair , not just those in , because if , a solution to the above program is free to set the slack parameter for any such pair. When , the slack parameter is constrained to be whenever — i.e. whenever .) Next imagine that . The parameter controls the expected difference in probability that a randomly selected pair is classified positively, weighted by the number of judges who feel they should be classified the same way — i.e. the expected degree of dissatisfaction of the panel of judges , over the random choice of a pair of individuals and the randomness of their classification333To see this, recall that , and so constraint 2 can be rewritten as , and if , and so the sum in constraint 3 can equivalently be taken over rather than ..

### 2.1 Fairness loss

Our goal is to develop an algorithm that will minimize its empirical error , while satisfying the empirical fairness constraints . The standard VC dimension argument states that empirical classification error will concentrate around the true classification error, and we hope to show the same kind of generalization for fairness as well. To do so, we first define fairness loss here.

For some fixed randomized hypothesis and , define -fairness loss between a pair as

 ΠD,w,γ((x,x′))=wx,x′max(0,∣∣∣Eh∼D[h(x)−h(x′)]∣∣∣−γ)

For a set of pairs , the -fairness loss of is defined to be:

 ΠD,w,γ(M)=1|M|∑(x,x′)∈MΠD,w,γ((x,x′))

This is the expected degree to which the difference in classification probability for a randomly selected pair exceeds the allowable budget , weighted by the fraction of judges who think that the pair should be treated similarly. By construction, the empirical fairness loss is bounded by (i.e. ), and we show in Section 4, the empirical fairness should concentrate around the true fairness loss .

### 2.2 Cost-sensitive classification

In our algorithm, we will make use of a cost-sensitive classification (CSC) oracle. An instance of CSC problem can be described by a set of costs and a hypothesis class, . and correspond to the cost of labeling as 0 and 1 respectively. Invoking a CSC oracle on returns a hypothesis such that

 h∗∈argminh∈Hn∑i=1(h(xi)c1i+(1−h(xi))c0i)

We say that an algorithm is oracle-efficient if it runs in polynomial time assuming access to a CSC oracle.

## 3 Empirical risk minimization

In this section, we give an oracle-efficient algorithm for approximately solving our (in-sample) constrained empirical risk minimization problem.

### 3.1 Outline of the solution

We frame the problem of solving our constrained ERM problem as finding an approximate equilibrium of a zero-sum game between a primal player and a dual player, trying to minimize and maximize respectively the Lagrangian of the constrained optimization problem.

The Lagrangian for our optimization problem is

 L(D,α,λ,τ)=err(D,S)+∑(i,j)∈[n]2λij(Eh∼D[h(xi)−h(xj)]−αij−γ)+τ⎛⎜⎝1|A|∑(i,j)∈[n]2wijαij−η⎞⎟⎠ (4)

For the constraint in equation (2), corresponding to each pair of individuals , we introduce a dual variable . For the constraint (3), we introduce a dual variable . The primal player’s action space is , and the dual player’s action space is .

Solving our constrained ERM problem equivalent to finding a minmax equilibrium of :

 argmin(D,α)∈Ω(S,^w,γ,η)err(D,S)=argminD∈ΔH,α∈[0,1]n2maxλ∈Rn2,τ∈RL(D,α,λ,τ)

Because is linear in terms of its parameters, Sion’s minimax theorem (Sion et al., 1958) gives us

 minD∈ΔH,α∈[0,1]n2maxλ∈Rn2,τ∈RL(D,α,λ,τ)=maxλ∈Rn2,τ∈RminD∈ΔH,α∈[0,1]n2L(D,α,λ,τ).

By a classic result of Freund and Schapire (1996), one can compute an approximate equilibrium by simulating “no-regret” dynamics between the primal and dual player. Our algorithm can be viewed as simulating the following no-regret dynamics between the primal and the dual players over rounds. Over each of the rounds, the dual player updates dual variables according to no-regret learning algorithms (exponentiated gradient descent (Kivinen and Warmuth, 1997) and online gradient descent (Zinkevich, 2003) respectively). At every round, the primal player then best responds with a pair using a CSC oracle. The time-averaged play of both players converges to an approximate equilibrium of the zero-sum game, where the approximation is controlled by the regret of the dual player.

### 3.2 Primal player’s best response

In each round , given the actions chosen by the dual player , the primal player needs to best respond by choosing such that

 (Dt,αt)∈argminD∈ΔH,α∈[0,1]n2L(D,α,λt,τt).

We do so by leveraging a CSC oracle. Given , we can set the costs as follows

 c0i=1nEh∼D[1(yi≠0)] and c1i=1nEh∼D[1(yi≠1)]+(λtij−λtji).

Then, (we note that the best response is always a deterministic classifier ). As for , we set if and 0 otherwise.

###### Lemma 3.1.

For fixed , the best response optimization for the primal player is separable, i.e.

 argminD,αL(D,α,λ,τ)=argminDLρ1λ,τ(D)×argminαLρ2λ,τ(α),

where

 Lρ1λ,τ(D)=err(h,D)+∑(i,j)∈[n]2λijEh∼D[h(xi)−h(xj)]

and

 Lρ2λ,τ(α)=∑(i,j)∈[n]2λij(−αij)+τ⎛⎜⎝1|A|∑(i,j)∈[n]2wijαij⎞⎟⎠
###### Proof.

First, note that is not dependent on and vice versa. Thus, we may separate the optimization as such:

 argminD,αL(D,α,λ,τ) =argminD,αerr(D,S)+∑(i,j)∈[n]2λij(Eh∼D[h(xi)−h(xj)]−αij−γ)+τ⎛⎜⎝1|A|∑(i,j)∈[n]2wijαij−η⎞⎟⎠ =argminDerr(D,S)+∑(i,j)∈[n]2λijEh∼D[h(xi)−h(xj)]×∑(i,j)∈[n]2λij(−αij)+τ⎛⎜⎝1|A|∑(i,j)∈[n]2wijαij⎞⎟⎠ =argminDLρ1λ,τ(D)×argminαLρ2λ,τ(α)

###### Lemma 3.2.

For fixed and , the output from minimizes

###### Proof.

The optimization

 argminαLρ2λ,τ =argminα∑(i,j)∈[n]2λij(−αij)+τ⎛⎜⎝1|A|∑(i,j)∈[n]2wijαij⎞⎟⎠ =argminα ∑(i,j)∈[n]2−λijαij+∑(i,j)∈[n]2τwij|A|αij =argminα ∑(i,j)∈[n]2αij(τwij|A|−λij).

Note that for any pair , the term . Thus, when the constant we assign as the maximum bound, , in order to minimize . Otherwise, when we assign as the minimum bound, 0. ∎

###### Lemma 3.3.

For fixed and , the output from minimizes

###### Proof.
 argminDLρ1λ,τ =argminDerr(D,S)+∑(i,j)∈[n]2λijEh∼D[h(xi)−h(xj)] =argminD1nn∑i=1Eh∼D[1(h(xi)≠yi)]+∑(i,j)∈[n]2λijEh∼D[h(xi)−h(xj)] =argminD n∑i=1⎛⎝1nEh∼D[1(h(xi)≠yi)]+∑j≠iλijh(xi)−∑j≠iλjih(xi)⎞⎠ =argminD n∑i=1⎛⎝1nEh∼D[1(h(xi)≠yi)]+∑j≠ih(xi)(λij−λji)⎞⎠.

For each we assign the cost

 ch(xi)i=1nEh∼D[1(h(xi)≠yi)]+h(xi)(λij−λji).

Note that the cost depends on whether or 1. For example, take and . The cost

 ch(xi)i=c0i =1nEh∼D[1(h(xi)≠yi)]+∑j≠ih(xi)(λij−λji) =1n⋅1+∑j≠i0⋅(λij−λji)=1n

### 3.3 Dual player’s no regret updates

In order to reason about convergence we need to restrict the dual player’s action space to lie within a bounded ball, defined by the parameters and that appear in our theorem — and serve to trade off running time with approximation quality:

The dual player will use exponentiated gradient descent (Kivinen and Warmuth, 1997) to update and online gradient descent (Zinkevich, 2003) to update , where the reward function will be defined as: and

###### Lemma 3.4.

Running online gradient descent for , i.e. , with step size yields the following regret

 maxτ∈TT∑t=1Lψ2Dt,αt(τ)−T∑t=1Lψ2Dt,αt(τt)≤Cτ√T.
###### Proof.

First, note that and

 τt=proj[0,Cτ](τt−1+μtτ(1W∑ijwijαt−1ij−η)).

From Zinkevich (2003), we find that the regret of this online gradient descent (translated into the terms of our paper) is bounded as follows:

 maxτ∈TT∑t=1Lψ2Dt,αt(τ)−T∑t=1Lψ2Dt,αt(τt)≤C2τ2μTτ+∣∣∣∣∇Lψ2D,α∣∣∣∣22T∑t=1μtτ, (5)

where the bound on our target term is , the gradient of our cost function at round is , and the bound To prove the above lemma, we first need to show that this bound

Since for all pairs , the Lagrangian For all , the gradient

 ∣∣∇Lψ2Dt,αt(τt−1)∣∣=∑ijwijαt−1ij|A|−η≤1.

Thus,

 ∣∣∇Lψ2D,α∣∣≤1.

Note that if we define then the summation of the step sizes is equal to

 T∑t=1μtτ=Cτ√T

Substituting these two results into inequality (5), we get that the regret

 maxτ∈TT∑t=1Lψ2Dt,αt(τ)−T∑t=1Lψ2Dt,αt(τt)≤C2τ2(Cτ /√T)+12Cτ√T=Cτ√T

###### Lemma 3.5.

Running exponentiated gradient descent for yields the following regret:

 maxλ∈ΛT∑t=1Lψ1Dt,αt(λ)−T∑t=1Lψ1Dt,αt(λt)≤2Cλ√Tlogn.
###### Proof.

In each round, the dual player gets to charge either some constraint or no constraint at all. In other words, he is presented with options. Therefore, to account for the option of not charging any constraint, we define vector , where the last coordinate, which will always be , corresponds to the option of not charging any constraint.

Next, we define the reward vector for as

 ζt=((Eh∼Dt[h(xi)−h(xj)]−αtij−γ)i,j∈[n]2,0).

Hence, the reward function is

 r(λ′t)=ζt⋅λ′t=Lψ1Dt,αt(λt).

The gradient of the reward function is

 ∇r(λ′t)=((∇r(λt))i,j∈[n2],0)=(ζt,0)

Note that the norm of the gradient is bounded by 1, i.e.

 ∣∣∣∣∇r(λ′t)∣∣∣∣∞≤1

because for any , each respective component of the gradient, , is bounded by 1.

Here, by the regret bound of Kivinen and Warmuth (1997), we obtain the following regret bound:

 maxλ∈ΛT∑t=1Lψ1Dt,αt(λ)−T∑t=1Lψ1Dt,αt(λt)≤lognμ+μ∣∣∣∣λ′∣∣∣∣21∣∣∣∣∇r(λ′)∣∣∣∣2∞T≤lognμ+μC2λT.

If we take the regret is bounded as follows:

 maxλ∈ΛT∑t=1Lψ1Dt,αt(λ)−T∑t=1Lψ1Dt,αt(λt)≤2Cλ√Tlogn.

please move the step size def into the pseudo-code ∎

### 3.4 Guarantee

Now, we appeal to Freund and Schapire (1996) to show that our no-regret dynamics converge to an approximate minmax equilibrium of . Then, we show that an approximate minmax equilibrium corresponds to an approximately optimal solution to our original constrained optimization problem.

###### Theorem 3.6 (Freund and Schapire (1996)).

Let be the primal player’s sequence of actions, and be the dual player’s sequence of actions. Let , , , and . Then, if the regret of the dual player satisfies

 maxλ∈Λ,τ∈TT∑t=1L(Dt,αt,λt,τt)−T∑t=1L(Dt,αt,λt,τt)≤ξψT,

and the primal player best responds in each round (), then is an -approximate solution

wait, we have approximate equilibrium, which is different from approximate solution for the Lagrangian, right?

###### Remark 3.7.

If the primal learner’s approximate best response satisfies

 T∑t=1L(Dt,αt,λt,τt)−minD∈Δ(H),α∈[0,1]n2T∑t=1L(D,α,λt,τt)≤ξρT

along with dual player’s regret of , then