# Preventing Fairness Gerrymandering: Auditing and Learning for Subgroup Fairness

The most prevalent notions of fairness in machine learning are statistical definitions: they fix a small collection of pre-defined groups, and then ask for parity of some statistic of the classifier across these groups. Constraints of this form are susceptible to (intentional or inadvertent) "fairness gerrymandering", in which a classifier appears to be fair on each individual group, but badly violates the fairness constraint on one or more structured subgroups defined over the protected attributes.. We propose instead to demand statistical notions of fairness across exponentially (or infinitely) many subgroups, defined by a structured class of functions over the protected attributes. This interpolates between statistical definitions of fairness, and recently proposed individual notions of fairness, but it raises several computational challenges. It is no longer clear how to even audit a fixed classifier to see if it satisfies such a strong definition of fairness. We prove that the computational problem of auditing subgroup fairness for both equality of false positive rates and statistical parity is equivalent to the problem of weak agnostic learning --- which means it is computationally hard in the worst case, even for simple structured subclasses. However, it also suggests that common heuristics for learning can be applied to successfully solve the auditing problem in practice. We then derive an algorithm that provably converges to the best fair distribution over classifiers in a given class, given access to oracles which can solve the agnostic learning and auditing problems. The algorithm is based on a formulation of subgroup fairness as fictitious play in a two-player zero-sum game between a Learner and an Auditor. We implement our algorithm using linear regression as a heuristic oracle, and show that we can effectively both audit and learn fair classifiers on real datasets.

• 30 publications
• 13 publications
• 47 publications
• 70 publications
08/24/2018

### An Empirical Study of Rich Subgroup Fairness for Machine Learning

Kearns et al. [2018] recently proposed a notion of rich subgroup fairnes...
06/12/2019

### Pairwise Fairness for Ranking and Regression

We present pairwise metrics of fairness for ranking and regression model...
09/09/2020

### Addressing Fairness in Classification with a Model-Agnostic Multi-Objective Algorithm

The goal of fairness in classification is to learn a classifier that doe...
05/26/2020

### Review of Mathematical frameworks for Fairness in Machine Learning

A review of the main fairness definitions and fair learning methodologie...
04/04/2020

### Abstracting Fairness: Oracles, Metrics, and Interpretability

It is well understood that classification algorithms, for example, for d...
05/25/2019

### Average Individual Fairness: Algorithms, Generalization and Experiments

We propose a new family of fairness definitions for classification probl...
12/05/2020

### FAIROD: Fairness-aware Outlier Detection

Fairness and Outlier Detection (OD) are closely related, as it is exactl...

## 1 Introduction

As machine learning is being deployed in increasingly consequential domains (including policing (Rudin, 2013), criminal sentencing (Barry-Jester et al., 2015), and lending (Koren, 2016)), the problem of ensuring that learned models are fair has become urgent.

Approaches to fairness in machine learning can coarsely be divided into two kinds: statistical and individual notions of fairness. Statistical notions typically fix a small number of protected demographic groups (such as racial groups), and then ask for (approximate) parity of some statistical measure across all of these groups. One popular statistical measure asks for equality of false positive or negative rates across all groups in (this is also sometimes referred to as an equal opportunity constraint (Hardt et al., 2016)). Another asks for equality of classification rates (also known as statistical parity). These statistical notions of fairness are the kinds of fairness definitions most common in the literature (see e.g. Kamiran and Calders (2012); Hajian and Domingo-Ferrer (2013); Kleinberg et al. (2017); Hardt et al. (2016); Friedler et al. (2016); Zafar et al. (2017); Chouldechova (2017)).

One main attraction of statistical definitions of fairness is that they can in principle be obtained and checked without making any assumptions about the underlying population, and hence lead to more immediately actionable algorithmic approaches. On the other hand, individual notions of fairness ask for the algorithm to satisfy some guarantee which binds at the individual, rather than group, level. This often has the semantics that “individuals who are similar” should be treated “similarly” (Dwork et al., 2012), or “less qualified individuals should not be favored over more qualified individuals” (Joseph et al., 2016). Individual notions of fairness have attractively strong semantics, but their main drawback is that achieving them seemingly requires more assumptions to be made about the setting under consideration.

The semantics of statistical notions of fairness would be significantly stronger if they were defined over a large number of subgroups, thus permitting a rich middle ground between fairness only for a small number of coarse pre-defined groups, and the strong assumptions needed for fairness at the individual level. Consider the kind of fairness gerrymandering that can occur when we only look for unfairness over a small number of pre-defined groups:

###### Example 1.1.

Imagine a setting with two binary features, corresponding to race (say black and white) and gender (say male and female), both of which are distributed independently and uniformly at random in a population. Consider a classifier that labels an example positive if and only if it corresponds to a black man, or a white woman. Then the classifier will appear to be equitable when one considers either protected attribute alone, in the sense that it labels both men and women as positive 50% of the time, and labels both black and white individuals as positive 50% of the time. But if one looks at any conjunction of the two attributes (such as black women), then it is apparent that the classifier maximally violates the statistical parity fairness constraint. Similarly, if examples have a binary label that is also distributed uniformly at random, and independently from the features, the classifier will satisfy equal opportunity fairness with respect to either protected attribute alone, even though it maximally violates it with respect to conjunctions of two attributes.

We remark that the issue raised by this toy example is not merely hypothetical. In our experiments in Section 5, we show that similar violations of fairness on subgroups of the pre-defined groups can result from the application of standard machine learning methods applied to real datasets. To avoid such problems, we would like to be able to satisfy a fairness constraint not just for the small number of protected groups defined by single protected attributes, but for a combinatorially large or even infinite collection of structured subgroups definable over protected attributes.

In this paper, we consider the problem of auditing binary classifiers for equal opportunity and statistical parity, and the problem of learning classifiers subject to these constraints, when the number of protected groups is large. There are exponentially many ways of carving up a population into subgroups, and we cannot necessarily identify a small number of these a priori as the only ones we need to be concerned about. At the same time, we cannot insist on any notion of statistical fairness for every subgroup of the population: for example, any imperfect classifier could be accused of being unfair to the subgroup of individuals defined ex-post as the set of individuals it misclassified. This simply corresponds to “overfitting” a fairness constraint. We note that the individual fairness definition of Joseph et al. (2016) (when restricted to the binary classification setting) can be viewed as asking for equalized false positive rates across the singleton subgroups, containing just one individual each111It also asks for equalized false negative rates, and that the false positive rate is smaller than the true positive rate. Here, the randomness in the “rates” is taken entirely over the randomness of the classifier. — but naturally, in order to achieve this strong definition of fairness, Joseph et al. (2016) have to make structural assumptions about the form of the ground truth. It is, however, sensible to ask for fairness for large structured subsets of individuals: so long as these subsets have a bounded VC dimension, the statistical problem of learning and auditing fair classifiers is easy, so long as the dataset is sufficiently large. This can be viewed as an interpolation between equal opportunity fairness and the individual “weakly meritocratic” fairness definition from Joseph et al. (2016), that does not require making any assumptions about the ground truth. Our investigation focuses on the computational challenges, both in theory and in practice.

### 1.1 Our Results

Briefly, our contributions are:

• Formalization of the problem of auditing and learning classifiers for fairness with respect to rich classes of subgroups .

• Results proving (under certain assumptions) the computational equivalence of auditing and (weak) agnostic learning of . While these results imply theoretical intractability of auditing for some natural classes , they also suggest that practical machine learning heuristics can be applied to the auditing problem.

• Provably convergent algorithms for learning classifiers that are fair with respect to , based on a formulation as a two-player zero-sum game between a Learner (the primal player) and an Auditor (the dual player). We provide two different algorithms, both of which are based on solving for the equilibrium of this game. The first provably converges in a polynomial number of steps and is based on simulation of the game dynamics when the Learner uses Follow the Perturbed Leader and the Auditor uses best response; the second is only guaranteed to converge asympotically but is computationally simpler, and involves both players using Fictitious Play.

• An implementation and extensive empirical evaluation of the Fictitious Play algorithm demonstrating its effectiveness on a real dataset in which subgroup fairness is a concern.

In more detail, we start by studying the computational challenge of simply checking

whether a given classifier satisfies equal opportunity and statistical parity. Doing this in time linear in the number of protected groups is simple: for each protected group, we need only estimate a single expectation. However, when there are many different protected attributes which can be combined to define the protected groups, their number is combinatorially large

222For example, as discussed in a recent Propublica investigation (Angwin and Grassegger, 2017), Facebook policy protects groups against hate speech if the group is definable as a conjunction of protected attributes. Under the Facebook schema, “race” and “gender” are both protected attributes, and so the Facebook policy protects “black women” as a distinct class, separately from black people and women. When there are protected attributes, there are protected groups. As a statistical estimation problem, this is not a large obstacle — we can estimate expectations to error so long as our data set has size , but there is now a computational problem..

We model the problem by specifying a class of functions defined over a set of protected attributes. defines a set of protected subgroups. Each function corresponds to the protected subgroup 333For example, in the case of Facebook’s policy, the protected attributes include “race, sex, gender identity, religious affiliation, national origin, ethnicity, sexual orientation and serious disability/disease” (Angwin and Grassegger, 2017), and represents the class of boolean conjunctions. In other words, a group defined by individuals having any subset of values for the protected attributes is protected.. The first result of this paper is that for both equal opportunity and statistical parity, the computational problem of checking whether a classifier or decision-making algorithm violates statistical fairness with respect to the set of protected groups is equivalent to the problem of agnostically learning (Kearns et al., 1994), in a strong and distribution-specific sense. This equivalence has two implications:

1. First, it allows us to import computational hardness results from the learning theory literature. Agnostic learning turns out to be computationally hard in the worst case, even for extremely simple classes of functions (like boolean conjunctions and linear threshold functions). As a result, we can conclude that auditing a classifier for statistical fairness violations with respect to a class is also computationally hard. This means we should not expect to find a polynomial time algorithm that is always guaranteed to solve the auditing problem.

2. However, in practice, various learning heuristics (like boosting, logistic regression, SVMs, backpropagation for neural networks, etc.) are commonly used to learn accurate classifiers which are known to be hard to learn in the worst case. The equivalence we show between agnostic learning and auditing is

distribution specific — that is, if on a particular data set, a heuristic learning algorithm can solve the agnostic learning problem (on an appropriately defined subset of the data), it can be used also to solve the auditing problem on the same data set.

These results appear in Section 3.

Next, we consider the problem of learning a classifier that equalizes false positive or negative rates across all (possibly infinitely many) sub-groups, defined by a class of functions . As per the reductions described above, this problem is computationally hard in the worst case.

However, under the assumption that we have an efficient oracles which solves the agnostic learning problem, we give and analyze algorithms for this problem based on a game-theoretic formulation. We first prove that the optimal fair classifier can be found as the equilibrium of a two-player, zero-sum game, in which the (pure) strategy space of the “Learner” player corresponds to classifiers in , and the (pure) strategy space of the “Auditor” player corresponds to subgroups defined by . The best response problems for the two players correspond to agnostic learning and auditing, respectively. We show that both problems can be solved with a single call to a cost sensitive classification oracle, which is equivalent to an agnostic learning oracle. We then draw on extant theory for learning in games and no-regret algorithms to derive two different algorithms based on simulating game play in this formulation. In the first, the Learner employs the well-studied Follow the Perturbed Leader (FTPL) algorithm on an appropriate linearization of its best-response problem, while the Auditor approximately best-responds to the distribution over classifiers of the Learner at each step. Since FTPL has a no-regret guarantee, we obtain an algorithm that provably converges in a polynomial number of steps.

While it enjoys strong provable guarantees, this first algorithm is randomized (due to the noise added by FTPL), and the best-response step for the Auditor is polynomial time but computationally expensive. We thus propose a second algorithm that is deterministic, simpler and faster per step, based on both players adopting the Fictitious Play learning dynamic. This algorithm has weaker theoretical guarantees: it has provable convergence only asymptotically, and not in a polynomial number of steps — but is more practical and converges rapidly in practice. The derivation of these algorithms (and their guarantees) appear in Section 4.

Finally, we implement the Fictitious Play algorithm and demonstrate its practicality by efficiently learning classifiers that approximately equalize false positive rates across any group definable by a linear threshold function on 18 protected attributes in the “Communities and Crime” dataset. We use simple, fast regression algorithms as heuristics to implement agnostic learning oracles, and (via our reduction from agnostic learning to auditing) auditing oracles. Our results suggest that it is possible in practice to learn fair classifiers with respect to a large class of subgroups that still achieve non-trivial error. We also implement the algorithm of Agarwal et al. (2017) to learn a classifier that approximately equalizes false positive rates on the same dataset on the 36 groups defined just by the 18 individual protected attributes. We then audit this learned classifier with respect to all linear threshold functions on the 18 protected attributes, and find a subgroup on which the fairness constraint is substantially violated, despite fairness being achieved on all marginal attributes. This shows that phenomenon like Example 1.1 can arise in real learning problems. Full details are contained in Section 5.

### 1.2 Further Related Work

Independent of our work, Hébert-Johnson et al. (2017) also consider a related and complementary notion of fairness that they call “multicalibration”. In settings in which one wishes to train a real-valued predictor, multicalibration can be considered the “calibration” analogue for the definitions of subgroup fairness that we give for false positive rates, false negative rates, and classification rates. For a real-valued predictor, calibration informally requires that for every value predicted by an algorithm, the fraction of individuals who truly have a positive label in the subset of individuals on which the algorithm predicted should be approximately equal to . Multicalibration asks for approximate calibration on every set defined implicitly by some circuit in a set . Hébert-Johnson et al. (2017) give an algorithmic result that is analogous to the one we give for learning subgroup fair classifiers: a polynomial time algorithm for learning a multi-calibrated predictor, given an agnostic learning algorithm for . In addition to giving a polynomial-time algorithm, we also give a practical variant of our algorithm (which is however only guaranteed to converge in the limit) that we use to conduct empirical experiments on real data.

Thematically, the most closely related piece of prior work is Zhang and Neill (2016), who also aim to audit classification algorithms for discrimination in subgroups that have not been pre-defined. Our work differs from theirs in a number of important ways. First, we audit the algorithm for common measures of statistical unfairness, whereas Zhang and Neill (2016) design a new measure compatible with their particular algorithmic technique. Second, we give a formal analysis of our algorithm. Finally, we audit with respect to subgroups defined by a class of functions , which we can take to have bounded VC dimension, which allows us to give formal out-of-sample guarantees. Zhang and Neill (2016) attempt to audit with respect to all possible sub-groups, which introduces a severe multiple-hypothesis testing problem, and risks overfitting. Most importantly we give actionable algorithms for learning subgroup fair classifiers, whereas Zhang and Neill (2016) restrict attention to auditing.

Technically, the most closely related piece of work (and from which we take inspiration for our algorithm in Section 4) is Agarwal et al. (2017), who show that given access to an agnostic learning oracle for a class , there is an efficient algorithm to find the lowest-error distribution over classifiers in subject to equalizing false positive rates across polynomially many subgroups. Their algorithm can be viewed as solving the same zero-sum game that we solve, but in which the “subgroup” player plays gradient descent over his pure strategies, one for each sub-group. This ceases to be an efficient or practical algorithm when the number of subgroups is large, as is our case. Our main insight is that an agnostic learning oracle is sufficient to have the both players play “fictitious play”, and that there is a transformation of the best response problem such that an agnostic learning algorithm is enough to efficiently implement follow the perturbed leader.

There is also other work showing computational hardness for fair learning problems. Most notably, Woodworth et al. (2017) show that finding a linear threshold classifier that approximately minimizes hinge loss subject to equalizing false positive rates across populations is computationally hard (assuming that refuting a random -XOR formula is hard). In contrast, we show that even checking whether a classifier satisfies a false positive rate constraint on a particular data set is computationally hard (if the number of subgroups on which fairness is desired is too large to enumerate).

## 2 Model and Preliminaries

We model each individual as being described by a tuple , where

denotes a vector of

protected attributes, denotes a vector of unprotected attributes, and denotes a label. Note that in our formulation, an auditing algorithm not only may not see the unprotected attributes , it may not even be aware of their existence. For example, may represent proprietary features or consumer data purchased by a credit scoring company.

We will write to denote the joint feature vector. We assume that points are drawn i.i.d. from an unknown distribution . Let be a decision making algorithm, and let denote the (possibly randomized) decision induced by on individual . We restrict attention in this paper to the case in which makes a binary classification decision: . Thus we alternately refer to as a classifier. When auditing a fixed classifier , it will be helpful to make reference to the distribution over examples together with their induced classification . Let denote the induced

target joint distribution

over the tuple that results from sampling , and providing , the true label , and the classification but not the unprotected attributes . Note that the randomness here is over both the randomness of , and the potential randomness of the classifier .

We will be concerned with learning and auditing classifiers satisfying two common statistical fairness constraints: equality of classification rates (also known as statistical parity), and equality of false positive rates (also known as equal opportunity). Auditing for equality of false negative rates is symmetric and so we do not explicitly consider it. Each fairness constraint is defined with respect to a set of protected groups. We define sets of protected groups via a family of indicator functions for those groups, defined over protected attributes. Each has the semantics that indicates that an individual with protected features is in group .

###### Definition 2.1 (Statistical Parity (SP) Subgroup Fairness).

Fix any classifier , distribution , collection of group indicators , and parameter . For each , define

 αSP(g,P)=PrP[g(x)=1]% and,βSP(g,D,P)=|SP(D)−SP(D,g)|,

where and denote the overall acceptance rate of and the acceptance rate of on group respectively. We say that satisfies -statistical parity (SP) Fairness with respect to and if for every

 αSP(g,P)βSP(g,D,P)≤γ.

We will sometimes refer to as the SP base rate.

###### Remark 2.2.

Note that our definition references two approximation parameters, both of which are important. We are allowed to ignore a group

if it (or its complement) represent only a small fraction of the total probability mass. The parameter

governs how small a fraction of the population we are allowed to ignore. Similarly, we do not require that the probability of a positive classification in every subgroup is exactly equal to the base rate, but instead allow deviations up to . Both of these approximation parameters are necessary from a statistical estimation perspective. We control both of them with a single parameter .

###### Definition 2.3 (False Positive (FP) Subgroup Fairness).

Fix any classifier , distribution , collection of group indicators , and parameter . For each , define

 αFP(g,P)=PrP[g(x)=1,y=0]and,βFP(g,D,P)=|FP(D)−FP(D,g)|

where and denote the overall false-positive rate of and the false-positive rate of on group respectively.

We say satisfies -False Positive (FP) Fairness with respect to and if for every

 αFP(g,P)βFP(g,D,P)≤γ.

We will sometimes refer to FP-base rate.

###### Remark 2.4.

This definition is symmetric to the definition of statistical parity fairness, except that the parameter is now used to exclude any group such that negative examples () from (or its complement) have probability mass less than . This is again necessary from a statistical estimation perspective.

For either statistical parity and false positive fairness, if the algorithm fails to satisfy the -fairness condition, then we say that is -unfair with respect to and . We call any subgroup which witnesses this unfairness an -unfair certificate for .

An auditing algorithm for a notion of fairness is given sample access to for some classifier . It will either deem to be fair with respect to , or will else produce a certificate of unfairness.

###### Definition 2.5 (Auditing Algorithm).

Fix a notion of fairness (either statistical parity or false positive fairness), a collection of group indicators over the protected features, and any such that . A -auditing algorithm for with respect to distribution is an algorithm such that for any classifier , when given access the distribution , runs in time , and with probability , outputs a -unfair certificate for whenever is -unfair with respect to and . If is -fair, will output “fair”.

As we will show, our definition of auditing is closely related to weak agnostic learning.

###### Definition 2.6 (Weak Agnostic Learning (Kearns et al., 1994; Kalai et al., 2008)).

Let be a distribution over and let such that . We say that the function class is -weakly agnostically learnable under distribution if there exists an algorithm such that when given sample access to , runs in time , and with probability , outputs a hypothesis such that

 minf∈Gerr(f,Q)≤1/2−ε⟹err(h,Q)≤1/2−ε′.

where .

#### Cost-Sensitive Classification.

In this paper, we will also give reductions to cost-sensitive classification (CSC) problems. Formally, an instance of a CSC problem for the class is given by a set of tuples such that corresponds to the cost for predicting label on point . Given such an instance as input, a CSC oracle finds a hypothesis that minimizes the total cost across all points:

 ^h∈argminh∈Hn∑i=1[h(Xi)c1i+(1−h(Xi))c0i] (1)

A crucial property of a CSC problem is that the solution is invariant to translations of the costs.

###### Claim 2.7.

Let be a CSC instance, and be a set of new costs such that there exist such that for all and . Then

 argminh∈Hn∑i=1[h(Xi)c1i+(1−h(Xi))c0i]=argminh∈Hn∑i=1[h(Xi)~c1i+(1−h(Xi))~c0i]
###### Remark 2.8.

We note that cost-sensitive classification is polynomially equivalent to agnostic learning Zadrozny et al. (2003). We give both definitions above because when describing our results for auditing, we wish to directly appeal to known hardness results for weak agnostic learning, but it is more convenient to describe our algorithms via oracles for cost-sensitive classification.

#### Follow the Perturbed Leader.

We will make use of the Follow the Perturbed Leader (FTPL) algorithm as a no-regret learner for online linear optimization problems (Kalai and Vempala, 2005). To formalize the algorithm, consider to be a set of “actions” for a learner in an online decision problem. The learner interacts with an adversary over rounds, and in each round , the learner (randomly) chooses some action , and the adversary chooses a loss vector . The learner incurs a loss of at round .

FTPL is a simple algorithm that in each round perturbs the cumulative loss vector over the previous rounds , and chooses the action that minimizes loss with respect to the perturbed cumulative loss vector. We present the full algorithm in Algorithm 1, and its formal guarantee in Theorem 2.9.

###### Theorem 2.9 (Kalai and Vempala (2005)).

For any sequence of loss vectors , the FTPL algorithm has regret

 E[T∑t=1⟨ℓt,at⟩]−mina∈ST∑t=1⟨ℓt,a⟩≤2d5/4M√T

where the randomness is taken over the perturbations across rounds.

### 2.1 Generalization Error

In this section, we observe that the error rate of a classifier , as well as the degree to which it violates -fairness (for both statistical parity and false positive rates) can be accurately approximated with the empirical estimates for these quantities on a dataset (drawn i.i.d. from the underlying distribution ) so long as the dataset is sufficiently large. Once we establish this fact, since our main interest is in the computational problem of auditing and learning, in the rest of the paper, we assume that we have direct access to the underlying distribution (or equivalently, that the empirical data defines the distribution of interest), and do not make further reference to sample complexity or overfitting issues.

A standard VC dimension bound (see, e.g. Kearns and Vazirani (1994)) states:

###### Theorem 2.10.

Fix a class of functions . For any distribution , let be a dataset consisting of examples sampled i.i.d. from . Then for any , with probability , for every , we have:

 |err(h,P)−err(h,S)|≤O⎛⎝√VCDIM(H)logm+log(1/δ)m⎞⎠

where .

The above theorem implies that so long as , then minimizing error over the empirical sample suffices to minimize error up to an additive term on the true distribution . Below, we give two analogous statements for fairness constraints:

###### Theorem 2.11 (SP Uniform Convergence).

Fix a class of functions and a class of group indicators . For any distribution , let be a dataset consisting of examples sampled i.i.d. from . Then for any , with probability , for every and

 |αSP(g,PS)βSP(g,h,PS)−αSP(g,P)βSP(g,h,P)|≤~O⎛⎝√(VCDIM(H)+VCDIM(G))logm+log(1/δ)m⎞⎠

where denotes the empirical distribution over the realized sample .

Similarly:

###### Theorem 2.12 (FP Uniform Convergence).

Fix a class of functions and a class of group indicators . For any distribution , let be a dataset consisting of examples sampled i.i.d. from . Then for any , with probability , for every and , we have:

 |αFP(g,P)βFP(g,D,P)−αFP(g,P)βFP(g,D,P)|≤~O⎛⎝√(VCDIM(H)+VCDIM(G))logm+log(1/δ)m⎞⎠

where denotes the empirical distribution over the realized sample .

These theorems together imply that for both SP and FP subgroup fairness, the degree to which a group violates the constraint of -fairness can be estimated up to error , so long as . The proofs can be found in Appendix B.

## 3 Equivalence of Auditing and Weak Agnostic Learning

In this section, we give a reduction from the problem of auditing both statistical parity and false positive rate fairness, to the problem of agnostic learning, and vice versa. This has two implications. The main implication is that, from a worst-case analysis point of view, auditing is computationally hard in almost every case (since it inherits this pessimistic state of affairs from agnostic learning). However, worst-case hardness results in learning theory have not prevented the successful practice of machine learning, and there are many heuristic algorithms that in real-world cases successfully solve “hard” agnostic learning problems. Our reductions also imply that these heuristics can be used successfully as auditing algorithms, and we exploit this in the development of our algorithmic results and their experimental evaluation.

We make the following mild assumption on the class of group indicators , to aid in our reductions. It is satisfied by most natural classes of functions, but is in any case essentially without loss of generality (since learning negated functions can be simulated by learning the original function class on a dataset with flipped class labels).

###### Assumption 3.1.

We assume the set of group indicators satisfies closure under negation: for any , we also have .

Recalling that and the following notions will be useful for describing our results:

• and .

• and .

• and .

• : the marginal distribution on .

• : the conditional distribution on , conditioned on .

We will think about these as the target distributions for a learning problem: i.e. the problem of learning to predict from only the protected features . We will relate the ability to agnostically learn on these distributions, to the ability to audit given access to the original distribution .

### 3.1 Statistical Parity Fairness

We give our reduction first for SP subgroup fairness. The reduction for FP subgroup fairness will follow as a corollary, since auditing for FP subgroup fairness can be viewed as auditing for statistical parity fairness on the subset of the data restricted to .

###### Theorem 3.2.

Fix any distribution , and any set of group indicators . Then for any , the following relationships hold:

• If there is a auditing algorithm for for all such that , then the class is -weakly agnostically learnable under .

• If is -weakly agnostically learnable under distribution for all such that , then there is a auditing algorithm for for SP fairness under .

We will prove Theorem 3.2 in two steps. First, we show that any unfair certificate for has non-trivial error for predicting the decision made by from the sensitive attributes.

###### Lemma 3.3.

Suppose that the base rate and there exists a function such that

 αSP(g,P)βSP(g,D,P)=γ.

Then

 max{Pr[D(X)=f(x)],Pr[D(X)=¬f(x)]}≥SP(D)+γ.
###### Proof.

To simplify notations, let denote the base rate, and . First, observe that either or holds.

In the first case, we know , and so . It follows that

 Pr[D(X)=f(x)] =Pr[D(X)=f(x)=1]+Pr[D(X)=f(x)=0] =Pr[D(X)=1∣f(x)=1]Pr[f(x)=1]+Pr[D(X)=0∣f(x)=0]Pr[f(x)=0] >α(b+β)+(1−α)(1−b) =(α−1)b+(1−α)(1−b)+b+αβ =(1−α)(1−2b)+b+αβ.

In the second case, we have and . We can then bound

 Pr[D(X)=f(x)] =Pr[D(X)=1∣f(x)=0]Pr[f(x)=0]+Pr[D(X)=0∣f(x)=1]Pr[f(x)=1] >(1−α)b+α(1−b+β)=α(1−2b)+b+αβ.

In both cases, we have by our assumption on the base rate. Since , we know

 max{Pr[D(X)=f(x)],Pr[D(X)=¬f(x)]}≥b+αβ=b+γ

which recovers our bound. ∎

In the next step, we show that if there exists any function that accurately predicts the decisions made by the algorithm , then either or can serve as an unfairness certificate for .

###### Lemma 3.4.

Suppose that the base rate and there exists a function such that for some value . Then there exists a function such that

 αSP(g,P)βSP(g,D,P)≥γ/2,

where .

###### Proof.

Let . We can expand as follows:

 Pr[D(X)=f(x)] =Pr[D(X)=f(x)=1]+Pr[D(X)=f(x)=0] =Pr[D(X)=1∣f(x)=1]Pr[f(x)=1]+Pr[D(X)=0∣f(x)=0]Pr[f(x)=0]

This means

 Pr[D(X)=f(x)]−b = (Pr[D(X)=1∣f(x)=1]−b)Pr[f(x)=1]+(Pr[D(X)=0∣f(x)=0]−b)Pr[f(x)=0]≥γ

Suppose that , then our claim holds with . Suppose not, then we must have

 (Pr[D(X)=0∣f(x)=0]−b)Pr[f(x)=0] =((1−b)−Pr[D(X)=1∣f(x)=0])Pr[f(x)=0]≥γ/2

Note that by our assumption . This means

 (b−Pr[D(X)=1∣f(x)=0])Pr[f(x)=0]≥((1−b)−Pr[D(X)=1∣f(x)=0])Pr[f(x)=0]≥γ/2

which implies that our claim holds with . ∎

###### Proof of Theorem 3.2.

Suppose that the class satisfies . Then by Lemma 3.4, there exists some such that . By the assumption of auditability, we can then use the auditing algorithm to find a group that is an -unfair certificate of . By Lemma 3.3, we know that either or predicts with an accuracy of at least .

In the reverse direction, consider the auditing problem on the classifier . We can treat each pair as a labelled example and learn a hypothesis in that approximates the decisions made by . Suppose that is -unfair. Then by Lemma 3.3, we know that there exists some such that . Therefore, the weak agnostic learning algorithm from the hypothesis of the theorem will return some with . By Lemma 3.4, we know or is a -unfair certificate for . ∎

### 3.2 False Positive Fairness

A corollary of the above reduction is an analogous equivalence between auditing for FP subgroup fairness and agnostic learning. This is because a FP fairness constraint can be viewed as a statistical parity fairness constraint on the subset of the data such that . Therefore, Theorem 3.2 implies the following:

###### Corollary 3.5.

Fix any distribution , and any set of group indicators . The following two relationships hold:

• If there is a auditing algorithm for for all such that , then the class is -weakly agnostically learnable under .

• If is –weakly agnostically learnable under distribution for all such that , then there is a auditing algorithm for FP subgroup fairness for under distribution .

### 3.3 Worst-Case Intractability of Auditing

While we shall see in subsequent sections that the equivalence given above has positive algorithmic and experimental consequences, from a purely theoretical perspective the reduction of agnostic learning to auditing has strong negative worst-case implications. More precisely, we can import a long sequence of formal intractability results for agnostic learning to obtain:

###### Theorem 3.6.

Under standard complexity-theoretic intractability assumptions, for the classes of conjunctions of boolean attributes, linear threshold functions, or bounded-degree polynomial threshold functions, there exist distributions such that the auditing problem cannot be solved in polynomial time, for either statistical parity or false positive fairness.

The proof of this theorem follows from Theorem 3.2, Corollary 3.5, and the following negative results from the learning theory literature. Feldman et al. (2012) show a strong negative result for weak agnostic learning for conjunctions: given a distribution on labeled examples from the hypercube such that there exists a monomial (or conjunction) consistent with -fraction of the examples, it is NP-hard to find a halfspace that is correct on -fraction of the examples, for arbitrary constant . Diakonikolas et al. (2011) show that under the Unique Games Conjecture, no polynomial-time algorithm can find a degree- polynomial threshold function (PTF) that is consistent with fraction of a given set of labeled examples, even if there exists a degree- PTF that is consistent with a fraction of the examples. Diakonikolas et al. (2011) also show that it is NP-Hard to find a degree-2 PTF that is consistent with a fraction of a given set of labeled examples, even if there exists a halfspace (degree-1 PTF) that is consistent with a fraction of the examples.

While Theorem 3.6 shows that certain natural subgroup classes yield intractable auditing problems in the worst case, in the rest of the paper we demonstrate that effective heuristics for this problem on specific (non-worst case) distributions can be used to derive an effective and practical learning algorithm for subgroup fairness.

## 4 A Learning Algorithm Subject to Fairness Constraints G

In this section, we present an algorithm for training a (randomized) classifier that satisfies false-positive subgroup fairness simultaneously for all protected subgroups specified by a family of group indicator functions . All of our techniques also apply to a statistical parity or false negative rate constraint.

Let denote a set of labeled examples , and let denote the empirical distribution over this set of examples. Let be a hypothesis class defined over both the protected and unprotected attributes, and let be a collection of group indicators over the protected attributes. We assume that contains a constant classifier (which implies that there is at least one fair classifier to be found, for any distribution).

Our goal will be to find the distribution over classifiers from that minimizes classification error subject to the fairness constraint over . We will design an iterative algorithm that, when given access to a CSC oracle, computes an optimal randomized classifier in polynomial time.

Let

denote a probability distribution over

. Consider the following Fair ERM (Empirical Risk Minimization) problem:

 minD∈ΔHEh∼D[err(h,P)] (2) such that ∀g∈G αFP(g,P)βFP(g,D,P)≤γ. (3)

where , and the quantities and are defined in Definition 2.3. We will write to denote the objective value at the optimum for the Fair ERM problem, that is the minimum error achieved by a -fair distribution over the class .

Observe that the optimization is feasible for any distribution

: the constant classifiers that labels all points 1 or 0 satisfy all subgroup fairness constraints. At the moment, the number of decision variables and constraints may be infinite (if

and are infinite hypothesis classes), but we will address this momentarily.

###### Assumption 4.1 (Cost-Sensitive Classification Oracle).

We assume our algorithm has access to the cost-sensitive classication oracles and over the classes and .

Our main theoretical result is an computationally efficient oracle-based algorithm for solving the Fair ERM problem.

###### Theorem 4.2.

Fix any . Then given an input of data points and accuracy parameters and access to oracles and , there exists an algorithm runs in polynomial time, and with probability at least , output a randomized classifier such that , and for any , the fairness constraint violations satisfies

 αFP(g,P)βFP(g,^D,P)≤γ+O(ν).

#### Overview of our solution.

We present our solution in steps:

• Step 1: Fair ERM as LP.

First, we rewrite the Fair ERM problem as a linear program with finitely many decision variables and constraints even when

and are infinite. To do this, we take advantage of the fact that Sauer’s Lemma lets us bound the number of labellings that any hypothesis class of bounded VC dimension can induce on any fixed dataset. The LP has one variable for each of these possible labellings, rather than one variable for each hypothesis. Moreover, again by Sauer’s Lemma, we have one constraint for each of the finitely many possible subgroups induced by on the fixed dataset, rather than one for each of the (possibly infinitely many) subgroups definable over arbitrary datasets. This step is important — it will guarantee that strong duality holds.

• Step 2: Formulation as Game. We then derive the partial Lagrangian of the LP, and note that computing an approximately optimal solution to this LP is equivalent to finding an approximate minmax solution for a corresponding zero-sum game, in which the payoff function is the value of the Lagrangian. The pure strategies of the primal or “Learner” player correspond to classifiers , and the pure strategies of the dual or “Auditor” player correspond to subgroups . Intuitively, the Learner is trying to minimize the sum of the prediction error and a fairness penalty term (given by the Lagrangian), and the Auditor is trying to penalize the fairness violation of the Learner by first identifying the subgroup with the greatest fairness violation and putting all the weight on the dual variable corresponding to this subgroup. In order to reason about convergence, we restrict the set of dual variables to lie in a bounded set: times the probability simplex. is a parameter that we have to set in the proof of our theorem to give the best theoretical guarantees — but it is also a parameter that we will vary in the experimental section.

• Step 3: Best Responses as CSC. We observe that given a mixed strategy for the Auditor, the best response problem of the Learner corresponds to a CSC problem. Similarly, given a mixed strategy for the Learner, the best response problem of the Auditor corresponds to an auditing problem (which can be represented as a CSC problem). Hence, if we have oracles for solving CSC problems, we can compute best responses for both players, in response to arbitrary mixed strategies of their opponents.

• Step 4: FTPL for No-Regret. Finally, we show that the ability to compute best responses for each player is sufficient to implement dynamics known to converge quickly to equilibrium in zero-sum games. Our algorithm has the Learner play Follow the Perturbed Leader (FTPL) Kalai and Vempala (2005), which is a no-regret algorithm, against an Auditor who at every round best responds to the learner’s mixed strategy. By the seminal result of Freund and Schapire (1996), the average plays of both players converge to an approximate equilibrium. In order to implement this in polynomial time, we need to represent the loss of the learner as a low-dimensional linear optimization problem. To do so, we first define an appropriately translated CSC problem for any mixed strategy by the Auditor, and cast it as a linear optimization problem.

### 4.1 Rewriting the Fair ERM Problem

To rewrite the Fair ERM problem, we note that even though both and can be infinite sets, the sets of possible labellings on the data set induced by these classes are finite. More formally, we will write and to denote the set of all labellings on that are induced by and respectively, that is

 G(S)={(g(x1),…,g(xn))∣g∈G}% and,H(S)={(h(X1),…,h(Xn))∣h∈H}

We can bound the cardinalities of and using Sauer’s Lemma.

###### Lemma 4.3 (Sauer’s Lemma (see e.g. Kearns and Vazirani (1994))).

Let be a data set of size . Let and be the VC-dimensions of the two classes. Then

 |H(S)|≤O(nd1) and |G(S)|≤O(nd2).

Given this observation, we can then consider an equivalent optimization problem where the distribution is over the set of labellings in , and the set of subgroups are defined by the labellings in . We will view each in as a Boolean function.

To simplify notations, we will define the following “fairness violation” functions for any and any :

 (4) Φ−(h,g)≡αFP(g,P)(FP(h,g)−FP(h))−γ (5)

Moreover, for any distribution over , for any sign

 Φ∙(D,g)=Eh∼D[Φ∙(h,g)].
###### Claim 4.4.

For any , , and any ,

 max{Φ+(D,g),Φ−(D,g)}≤ν if and only if αFP(g,P)βFP(g,D,P)≤γ+ν.

Thus, we will focus on the following equivalent optimization problem.

 minD∈ΔH(S) Eh∼D[err(h,P)] (6) such that for each g∈G(S): Φ+(D,g)≤0 (7) Φ−(D,g)≤0 (8)

For each pair of constraints (7) and (8), corresponding to a group , we introduce a pair of dual variables and . The partial Lagrangian of the linear program is the following:

 L(D,λ)=Eh∼D[err(h,P)]+∑g∈G(S)(λ+gΦ+(D,g)+λ−gΦ−(D,g))

By Sion’s minmax theorem (Sion, 1958), we have

where denotes the optimal objective value in the fair ERM problem. Similarly, the distribution corresponds to an optimal feasible solution to the fair ERM linear program. Thus, finding an optimal solution for the fair ERM problem reduces to computing a minmax solution for the Lagrangian. Our algorithms will both compute such a minmax solution by iteratively optimizing over both the primal variables and dual variables . In order to guarantee convergence in our optimization, we will restrict the dual space to the following bounded set:

 Λ={λ∈R2|G(S)|+∣∥λ∥1≤C}.

where will be a parameter of our algorithm. Since is a compact and convex set, the minmax condition continues to hold (Sion, 1958):

 minD∈ΔH(S)maxλ∈ΛL(D,λ)=maxλ∈ΛminD∈ΔH(S)L(D,λ) (9)

If we knew an upper bound on the norm of the optimal dual solution, then this restriction on the dual solution would not change the minmax solution of the program. We do not in general know such a bound. However, we can show that even though we restrict the dual variables to lie in a bounded set, any approximate minmax solution to Equation 9 is also an approximately optimal and approximately feasible solution to the original fair ERM problem.

###### Theorem 4.5.

Let be a -approximate minmax solution to the -bounded Lagrangian problem in the sense that

 L(^D,^λ)≤minD∈ΔH(S)L(D,^λ)+νand,L(^D,^λ)≥maxλ∈ΛL(^D,λ)−ν.

Then and for any ,

 αFP(g,P)βFP(g,^D,P)≤γ+1+2νC.

### 4.2 Zero-Sum Game Formulation

To compute an approximate minmax solution, we will first view Equation 9 as the following two player zero-sum matrix game. The Learner (or the minimization player) has pure strategies corresponding to , and the Auditor (or the maximization player) has pure strategies corresponding to the set of vertices in — more precisely, each vertex or pure strategy either is the all zero vector or consists of a choice of a , along with the sign or that the corresponding -fairness constraint will have in the Lagrangian. More formally, we write

 Λpure={λ∈Λ with λ∙g=C∣g∈G(S),∙∈{±}}∪{0}

Even though the number of pure strategies scales linearly with , our algorithm will never need to actually represent such vectors explicitly. Note that any vector in can be written as a convex combination of the maximization player’s pure strategies, or in other words: as a mixed strategy for the Auditor. For any pair of actions , the payoff is defined as

 U(h,λ)=err(h,P)+∑g∈G(S)(λ+gΦ+(h,g)+λ−gΦ−(h,g)).
###### Claim 4.6.

Let and such that is a -approximate minmax equilibrium in the zero-sum game defined above. Then is also a -approximate minmax solution for Equation 9.

Our problem reduces to finding an approximate equilibrium for this game. A key step in our solution is the ability to compute best responses for both players in the game, which we now show can be solved by the cost-sensitive classication (CSC) oracles.

#### Learner’s best response as CSC.

Fix any mixed strategy (dual solution) of the Auditor. The Learner’s best response is given by:

 argminD∈ΔH(S)err(h,P)+∑g∈G(S)(λ+gΦ+(D,g)+λ−gΦ−(D,g)) (10)

Note that it suffices for the Learner to optimize over deterministic classifiers , rather than distributions over classifiers. This is because the Learner is solving a linear optimization problem over the simplex, and so always has an optimal solution at a vertex (i.e. a single classifier ). We can reduce this problem to one that can be solved with a single call to a CSC oracle. In particular, we can assign costs to each example as follows:

• if , then and ;

• otherwise, and

 c1i=1n +1n∑g∈G(S)(λ+g−λ−g)(Pr[g(x)=1∣y=0]−1)1[g(xi)=1] (11)

Given a fixed set of dual variables , we will write to denote the vector of costs for labelling each datapoint as . That is, is the vector such that for any ,

###### Remark 4.7.

Note that in defining the costs above, we have translated them from their most natural values so that the cost of labeling any example with 0 is 0. In doing so, we recall that by Claim 2.7, the solution to a cost-sensitive classification problem is invariant to translation. As we will see, this will allow us to formulate the learner’s optimization problem as a low-dimensional linear optimization problem, which will be important for an efficient implementation of follow the perturbed leader. In particular, if we find a hypothesis that produces the labels for the points in our dataset, then the cost of this labelling in the CSC problem is by construction .

#### Auditor’s best response as CSC.

Fix any mixed strategy (primal solution) of the Learner. The Auditor’s best response is given by:

 argmaxλ∈Λerr(D,P)+∑g∈G(S)(λ+gΦ+(D,g)+λ−gΦ−(D,g))=argmaxλ∈Λ∑g∈G(S)(λ+gΦ+(D,g)+λ−gΦ−(D,g)) (12)

To find the best response, consider the problem of computing . There are two cases. In the first case, is a strictly feasible primal solution: that is . In this case, the solution to (12) sets . Otherwise, if is not strictly feasible, then by the following Lemma 4.8 the best response is to set (and all other coordinates to 0).

###### Lemma 4.8.

Fix any such that that . Let be vector with one non-zero coordinate , where

 (g′,∙′)