DeepAI

# Individual Fairness Guarantees for Neural Networks

We consider the problem of certifying the individual fairness (IF) of feed-forward neural networks (NNs). In particular, we work with the ϵ-δ-IF formulation, which, given a NN and a similarity metric learnt from data, requires that the output difference between any pair of ϵ-similar individuals is bounded by a maximum decision tolerance δ≥ 0. Working with a range of metrics, including the Mahalanobis distance, we propose a method to overapproximate the resulting optimisation problem using piecewise-linear functions to lower and upper bound the NN's non-linearities globally over the input space. We encode this computation as the solution of a Mixed-Integer Linear Programming problem and demonstrate that it can be used to compute IF guarantees on four datasets widely used for fairness benchmarking. We show how this formulation can be used to encourage models' fairness at training time by modifying the NN loss, and empirically confirm our approach yields NNs that are orders of magnitude fairer than state-of-the-art methods.

• 2 publications
• 11 publications
• 15 publications
• 20 publications
• 46 publications
05/20/2022

### CertiFair: A Framework for Certified Global Fairness of Neural Networks

We consider the problem of whether a Neural Network (NN) model satisfies...
01/30/2021

### Fairness through Optimization

We propose optimization as a general paradigm for formalizing fairness i...
09/28/2021

### Local Repair of Neural Networks Using Optimization

In this paper, we propose a framework to repair a pre-trained feed-forwa...
06/21/2020

### Verifying Individual Fairness in Machine Learning Models

We consider the problem of whether a given decision model, working with ...
02/11/2021

### Investigating Trade-offs in Utility, Fairness and Differential Privacy in Neural Networks

To enable an ethical and legal use of machine learning algorithms, they ...
09/15/2022

### iFlipper: Label Flipping for Individual Fairness

As machine learning becomes prevalent, mitigating any unfairness present...
12/05/2019

### Perfectly Parallel Fairness Certification of Neural Networks

Recently, there is growing concern that machine-learning models, which c...

## 1 Introduction

Reservations have been raised about the application of neural networks (NN) in contexts where fairness is of concern [Barocas2018BigImpact]. Because of inherent biases present in real-world data, if unchecked, these models have been found to discriminate against individuals on the basis of sensitive features, such as race or sex [Bolukbasi2016ManEmbeddings, Angwin2016MachineBlacks]. Recently, the topic has come under the spotlight, with technologies being increasingly challenged for bias [hardesty_2018, kirk2021OutOfTheBox, the_guardian_2020], leading to the introduction of a range of definitions and techniques for capturing the multifaceted properties of fairness.

Fairness approaches are broadly categorised into: group fairness [Hardt2016EqualityLearning], which inspects the model over data demographics; and individual fairness (IF) [Dwork2012FairnessAwareness], which considers the behaviour over each individual. Despite its wider adoption, group fairness is only concerned with statistical properties of the model so that a situation may arise where predictions of a group-fair model can be perceived as unfair by a particular individual. In contrast, IF is a worst-case measure with guarantees over every possible individual in the input space. However, while techniques exist for group fairness of NNs [albarghouthi2017fairsquare, Bastani2018ProbabilisticConcentration], research on IF has thus far been limited to designing training procedures that favour fairness [yurochkin2020training, yeom2020individual, McNamara2017ProvablyRepresentations] and verification over specific individuals [ruoss2020learning]. To the best of our knowledge, there is currently no work targeted at global certification of IF for NNs.

We develop an anytime algorithm with provable bounds for the certification of IF on NNs. We build on the --IF formalisation employed by john2020verifying. That is, given and a distance metric that captures the similarity between individuals, we ask that, for every pair of points and in the input space with , the NN’s output does not differ by more than . Although related to it, IF certification on NNs poses a different problem than adversarial robustness [Tjeng2019EvaluatingProgramming], as both and are here problem variables, spanning the whole space. Hence, local approximation techniques developed in the adversarial literature cannot be employed in the context of IF.

Nevertheless, we show how this global, non-linear requirement can be encoded in Mixed-Integer Linear Programming (MILP) form, by deriving a set of global upper and lower piecewise-linear (PWL) bounds over each activation function in the NN over the whole input space, and performing linear encoding of the (generally non-linear) similarity metric

. The formulation of our optimisation as a MILP allows us to compute an anytime, worst-case bound on IF, which can thus be computed using standard solvers from the global optimisation literature [dantzig2016linear]. Furthermore, we demonstrate how our approach can be embedded into the NN training so as to optimise for individual fairness at training time. We do this by performing gradient descent on a weighted loss that also accounts for the maximum -variation in -neighborhoods for each training point, similarly to what is done in adversarial learning [goodfellow2014explaining, Gowal2018OnModels, wicker2021bayesian].

We apply our method on four benchmarks widely employed in the fairness literature, namely, the Adult, German, Credit and Crime datasets [UCIDatasets], and an array of similarity metrics learnt from data that include

, Mahalanobis, and NN embeddings. We empirically demonstrate how our method is able to provide the first, non-trivial IF certificates for NNs commonly employed for tasks from the IF literature, and even larger NNs comprising up to thousands of neurons. Furthermore, we find that our MILP-based fair training approach consistently outperforms, in terms of IF guarantees, NNs trained with a competitive state-of-the-art technique by orders of magnitude, albeit at an increased computational cost.

The paper makes the following main contributions:222Proofs and additional details can be found in Appendix of an extended version of the paper available at http://www.fun2model.org/bibitem.php?key=BPW+22.

• We design a MILP-based, anytime verification approach for the certification of IF as a global property on NNs.

• We demonstrate how our technique can be used to modify the loss function of a NN to take into account certification of IF at training time.

• On four datasets, and an array of metrics, we show how our techniques obtain non-trivial IF certificates and train NNs that are significantly fairer than state-of-the-art.

#### Related Work

A number of works have considered IF by employing techniques from adversarial robustness. yeom2020individual rely on randomized smoothing to find the highest stable per-feature difference in a model. Their method, however, provides only (weak) guarantees on model statistics. yurochkin2020training present a method for IF training that builds on projected gradient descent and optimal transport. While the method is found to decrease model bias to state-of-the-art results, no formal guarantees are obtained. ruoss2020learning adapted the MILP formulation for adversarial robustness to handle fair metric embeddings. However, rather than tackling the IF problem globally as introduced by Dwork2012FairnessAwareness, the method only works iteratively on a finite set of data, hence leaving open the possibility of unfairness in the model. In contrast, the MILP encoding we obtain through PWL bounding of activations and similarity metrics allows us to provide guarantees over any possible pair of individuals. Urban2020PerfectlyNetworks employ static analysis to certify causal fairness. While this method yields global guarantees, it cannot be straightforwardly employed for IF, and it is not anytime, making exhaustive analysis impractical. john2020verifying present a method for the computation of IF, though limited to linear and kernel models. MILP and linear relaxation have been employed to certify NNs in local adversarial settings [ehlers2017formal, Tjeng2019EvaluatingProgramming, wicker2020probabilistic]. However, local approximations cannot be employed for the global IF problem. While katz2017reluplex, leino2021globally consider global robustness, their methods are restricted to

metrics. Furthermore, they require the knowledge of a Lipschitz constant or are limited to ReLU.

## 2 Individual Fairness

We focus on regression and binary classification with NNs with real-valued inputs and one-hot encoded categorical features.

333Multi-class can be tackled with component-wise analyses. Such frameworks are often used in automated decision-making, e.g. for loan applications [Hardt2016EqualityLearning]. Formally, given a compact input set and an output set , we consider an layer fully-connected NN

, parameterised by a vector of weights

trained on . For an input , and , the NN is defined as:

 ϕ(i)j=ni−1∑k=1W(i)jkζ(i−1)k+b(i)j, ζ(i)j=σ(i)(ϕ(i)j) (1)

where . Here, is the number of units in the th layer, and are its weights and biases, is the activation function, is the pre-activation and the activation. The NN output is the result of these computations, In regression,

is the prediction, while for classification it represents the class probability. In this paper we focus on fully-connected NNs as widely employed in the IF literature

[yurochkin2020training, Urban2020PerfectlyNetworks, ruoss2020learning]

. However, we should stress that our framework, being based on MILP, can be easily extended to convolutional, max-pool and batch-norm layers or res-nets by using embedding techniques from the adversarial robustness literature (see e.g.

[boopathy2019cnn].

#### Individual Fairness

Given a NN , IF [Dwork2012FairnessAwareness] enforces the property that similar individuals are similarly treated. Similarity is defined according to a task-dependent pseudometric, , provided by a domain expert (e.g., a Mahalanobis distance correlating each feature to the sensitive one), whereas similarity of treatment is expressed via the absolute difference on the NN output . We adopt the --IF formulation of john2020verifying for the formalisation of input-output IF similarity.

###### Definition 1 (ϵ-δ-if [john2020verifying]).

Consider and . We say that is --individually fair w.r.t.  iff

 ∀x′,x′′s.t.dfair(x′,x′′)≤ϵ⟹|fw(x′)−fw(x′′)|≤δ.

Here, measures similarity between individuals and is the difference in outcomes (class probability for classification). We emphasise that individual fairness is a global notion, as the condition in Definition 1 must hold for all pairs of points in . We remark that the --IF formulation of john2020verifying (which is more general than IF formulation typically used in the literature [yurochkin2020training, ruoss2020learning]) is a slight variation on the Lipschitz property introduced by Dwork2012FairnessAwareness. While introducing greater flexibility thanks to its parametric form, it makes an IF parametric analysis necessary at test time. In Section 4 we analyse how --IF of NNs is affected by variations of and . A crucial component of IF is the similarity . The intuition is that sensitive features, or their sensitive combination, should not influence the NN output. While a number of metrics has been discussed in the literature [ilvento19metric], we focus on the following representative set of metrics which can be automatically learnt from data [john2020verifying, ruoss2020learning, mukherjee20simple, yurochkin2020training]. Details on metric learning is given in Appendix B.
Weighted : In this case is defined as a weighted version of an metric, i.e. . Intuitively, we set the weights related to sensitive features to zero, so that two individuals are considered similar if they only differ with respect to those. The weights for the remaining features can be tuned according to their degree of correlation to the sensitive features.
Mahalanobis: In this case we have , for a given positive semi-definite (SPD) matrix . The Mahalanobis distance generalises the metric by taking into account the intra-correlation of features to capture latent dependencies w.r.t. the sensitive features.
Feature Embedding: The metric is computed on an embedding, so that , where is either the Mahalanobis or the weighted metric, and is a feature embedding map. These allow for greater modelling flexibility, at the cost of reduced interpretability.

### 2.1 Problem Formulation

We aim at certifying --IF for NNs. To this end we formalise two problems: computing certificates and training for IF.

###### Problem 1 (Fairness Certification).

Given a trained NN , a similarity and a distance threshold , compute

 δmax=maxx′,x′′∈Xdfair(x′,x′′)≤ϵ|fw(x′)−fw(x′′)|.

Problem 1 provides a formulation in terms of optimisation, seeking to compute the maximum output change for any pair of input points whose distance is no more than . One can then compare with any threshold : if holds then the model has been certified to be --IF.

While Problem 1 is concerned with an already trained NN, the methods we develop can also be employed to encourage IF at training time. Similarly to the approaches for adversarial learning [goodfellow2014explaining], we modify the training loss to balance between the model fit and IF.

###### Problem 2 (Fairness Training).

Consider an NN , a training set , a similarity metric and a distance threshold . Let be a constant. Define the IF-fair loss as

 Lfair (fw(xi),yi,fw(x∗i),λ)= λL(fw(xi),yi)+(1−λ)|fw(xi)−fw(x∗i)|,

where . The -IF training problem is defined as finding s.t.:

 wfair=argminwnd∑i=1Lfair(fw(xi),yi).

In Problem 2 we seek to train a NN that not only is accurate, but whose predictions are also fair according to Definition 1. Parameter balances between accuracy and IF. In particular, for we recover the standard training that does not account for IF, while for we only consider IF.

## 3 A MILP Approach For Individual Fairness

Certification of individual fairness on a NN thus requires us to solve the following global, non-convex optimisation problem:

 maxx′,x′′∈X |δ| subject to δ=fw(x′)−fw(x′′) (2) dfair(x′,x′′)≤ϵ. (3)

We develop a Mixed-Integer Linear Programming (MILP) over-approximation (i.e., providing a sound bound) to this problem. We notice that there are two sources of non-linearity here, one induced by the NN (Equation (2)), which we refer to as the model constraint, and the other by the fairness metric (Equation (3)), which we call fairness constraint. In the following, we show how these can be modularly bounded by piecewise-linear functions. In Section 3.3 we bring the results together to derive a MILP formulation for --IF.

### 3.1 Model Constraint

We develop a scheme based on piecewise-linear (PWL) upper and lower bounding for over-approximating all commonly used non-linear activation functions. An illustration of the PWL bound is given in Figure 1. Let and be lower and upper bounds on the pre-activation .444Computed by bound propagation over [Gowal2018OnModels]. We proceed by building a discretisation grid over the values on grid points: , with and , such that, in each partition interval , we have that is either convex or concave. We then compute linear lower and upper bound functions for in each as follows. If is convex (resp. concave) in , then an upper (resp. lower) linear bound is given by the segment connecting the two extremum points of the interval, and a lower (resp. upper) linear bound is given by the tangent through the mid-point of the interval. We then compute the values of each linear bound in each of its grid points, and select the minimum of the lower bounds and the maximum of the upper bound values, which we store in two vectors and . The following lemma is a consequence of this construction.

###### Lemma 1.

Let . Denote with the index associated to the partition of in which falls and consider such that . Then:

 σ(i)(ϕ)≥ηζPWL,(i),Lj,l−1+(1−η)ζPWL,(i),Lj,l, σ(i)(ϕ)≤ηζPWL,(i),Uj,l−1+(1−η)ζPWL,(i),Uj,l,

that is, and define continuous PWL lower and upper bounds for in .

Lemma 3.1 guarantees that we can bound the non-linear activation functions using PWL functions. Crucially, PWL functions can then be encoded into the MILP constraints.

###### Proposition 1.

Let for

, be binary variables, and

be continuous ones. Consider then it follows that implies:

 M∑l=1y(i)j,l=1,M∑l=1η(i)j,l=1,ϕ(i)j=M∑l=1ϕ(i)Lj,lη(i)j,l,y(i)j,l≤ η(i)j,l+η(i)j,l+1,M∑l=1ζ% PWL,(i),Lj,lη(i)j,l≤ζ(i)j≤M∑l=1ζPWL,(i),Uj,lη(i)j,l.

A proof can be found in Appendix A. Proposition 1 ensures that the global behaviour of each NN neuron can be over-approximated by linear constraints using auxiliary variables. Employing Proposition 1 we can encode the model constraint of Equation (2) into the MILP form in a sound way.

The over-approximation error does not depend on the MILP formulation (which is exact), but on the PWL bounding, and is hence controllable through the selection of the number of grid points , and becomes exact in the limit. Notice that in the particular case of ReLU activation functions the over-approximation is exact for any

###### Proposition 2.

Assume to be continuously differentiable everywhere in , except possibly in a finite set. Then PWL lower and upper bounding functions of Lemma 3.1 converge uniformly to as goes to infinity.

Furthermore, define , then for finite values of the error on the lower (resp. upper) bounding in convex (resp. concave) regions of for is given by:

 e1(ϕ)≤ΔM2(σ′(ϕ(i)j,l+1)−σ′(ϕ(i)j,l+1−ΔM2))

and upper (resp. lower) in concave (resp. convex) regions:

 e2(ϕ)≤ΔM⎛⎜ ⎜⎝σ(ϕ(i)j,l+ΔM)−σ(ϕ(i)j,l)ΔM+σ′(ϕ(i)j,l)⎞⎟ ⎟⎠.

A proof of Proposition 2 is given in Appendix A, alongside an experimental analysis of the convergence rate.

We remark that the PWL bound can be used over all commonly employed activation functions . The only assumption made is that has a finite number of inflection points over any compact interval of . For convergence (Prop. 2) we require continuous differentiability almost everywhere, which is satisfied by commonly used activations.

### 3.2 Fairness Constraint

The encoding of the fairness constraint within the MILP formulation depends on the specific form of the metric .

Weighted Metric: The weighted metric can be tackled by employing rectangular approximation regions. While this is straightforward for the metric, for the remaining cases interval abstraction can be used [dantzig2016linear].

Mahalanobis Metric: We first compute an orthogonal decomposition of as in , where

is the eigenvector matrix of

and is a diagonal matrix with eigenvalues as entries. Consider the rotated variables and , then we have that Equation (3) can be re-written as . By simple algebra we thus have that, for each , . By transforming back to the original variables, we obtain that Equation (3) can be over-approximated by:

Feature Embedding Metric We tackle the case in which used in the metric definition, i.e. , is a NN embedding. This is straightforward as can be encoded into MILP as for the model constraint.

### 3.3 Overall Formulation

We now formulate the MILP encoding for the over-approximation of --IF. For Equation (2), we proceed by deriving a set of approximating constraints for the variables and by using the techniques described in Section 3.1. We denote the corresponding variables as , and , , respectively. The NN final output on and on will then respectively be and , so that . Finally, we over-approximate Equation (3) as described in Section 3.2. In the case of Mahalanobis distance, we thus obtain:

 maxx′,x′′∈X |δ| (4) subject to=ζ′(L)−ζ′′(L) fori=1,…,L,j=1,…,ni,†∈{′,′′}: M∑l=1y†(i)j,l=1,M∑l=1η†(i)j,l=1,y(i)j,l≤η(i)j,l+η(i)j,l+1 ϕ†(i)j=ni−1∑k=1W(i)jkx†k+b(i)j,ϕ†(i)j=M∑l=1ϕ(i)Lj,lη†(i)j,l M∑l=1ζPWL,(i),Lj,lη†(i)j,l≤ζ†(i)j≤M∑l=1ζPWL,(i),Uj,lη†(i)j,l −ϵ2√diag(Λ)≤Ux′−Ux′′≤ϵ2√diag(Λ).

Though similar, the above MILP is significantly different from those used for adversarial robustness (see e.g. Tjeng2019EvaluatingProgramming). First, rather than looking for perturbations around a fixed a point, here we have both and as variables. Furthermore, rather than being local, the MILP problem for --IF is global, over the whole input space . As such, local approximations of non-linearities cannot be used, as the bounding needs to be valid simultaneously over the whole input space. Finally, while in adversarial robustness one can ignore the last sigmoid layer, for IF, because of the two optimisation variables, one cannot simply map from the last pre-activation value to the class probability, so that even for ReLU NNs one needs to employ bounding of non-piecewise activations for the final sigmoid.

By combining the results from this section, we have:

###### Theorem 1.

Consider , a similarity and a NN . Let and be the optimal points for the optimisation problem in Equation (4). Define . Then is --individually fair w.r.t.  for any .

Theorem 1, whose proof can be found in Appendix A

, states that a solution of the MILP problem provides us with a sound estimation of individual fairness of an NN. Crucially, it can be shown that branch-and-bound techniques for the solution of MILP problems converge in finite time to the optimal solution

[del2012convergence], while furthermore providing us with upper and lower bounds for the optimal value at each iteration step. Therefore, we have:

###### Corollary 1.

Let and lower and upper bounds computed by a MILP solver at step . Then we have that: . Furthermore, given a precision, , there exist a finite such that .

That is, our method is sound and anytime, as at each iteration step in the MILP solving we can retrieve a lower and an upper bound on , which can thus be used to provide provable guarantees while converging to in finite time.

#### Complexity Analysis

The encoding of the model constraint can be done in , where is the maximum width of , is the number of layers, and is the number of grid points used for the PWL bound. The computational complexity of the fairness constraints depends on the similarity metric employed. While for no processing needs to be done, the computational complexity is for the Mahalanobis distance and again for the feature embedding metric. Each iteration of the MILP solver entails the solution of a linear programming problem and is hence . Finite time convergence of the MILP solver to with precision is exponential in the number of problem variables, in and .

### 3.4 Fairness Training for Neural Networks

The --IF MILP formulation introduced in Section 3 can be adapted for the solution of Problem 2. The key step is the computation of in the second component of the modified loss introduced in Problem 2, which is used to introduce fairness directly into the loss of the neural network. This computation can be done by observing that, for every training point drawn from , the computation of is a particular case of the formulation described in Section 3, where, instead of having two variable input points, only one input point is a problem variable while the other is given and drawn from the training dataset . Therefore, can be computed by solving the MILP problem, where we fix a set of the problem variables to , and can be subsequently used to obtain the value of the modified loss function. Note that these constraints are not cumulative, since they are built for each mini-batch, and discarded after optimization is solved to update the weights.

We summarise our fairness training method in Algorithm 1. For each batch in each of the training epochs, we perform a forward pass of the NN to obtain the output, (line 5). We then formulate the MILP problem as in Section 3 (line 6), and initialise an empty set variable to collect the solutions to the various sub-problems (line 7). Then, for each training point in the mini-batch, we fix the MILP constraints to the variables associated with (line 9), solve the resulting MILP for , and place in the set that collects the solutions, i.e. . Finally, we compute the NN predictions on (line 13); the result is used to compute the modified loss function (line 14) and the weights are updated by taking a step of gradient descent. The resulting set of weights balances the empirical accuracy and fairness around the training points.

The choice of affects the relative importance of standard training w.r.t. the fairness constraint: is equivalent to standard training, while only optimises for fairness. In our experiments we keep for half of the training epochs, and then change it to .

## 4 Experiments

In this section, we empirically validate the effectiveness of our MILP formulation for computing --IF guarantees as well as for fairness training of NNs. We perform our experiments on four UCI datasets [UCIDatasets]: the Adult dataset (predicting income), the Credit dataset (predicting payment defaults), the German dataset (predicting credit risk) and the Crime dataset (predicting violent crime). In each case, features encoding information regarding gender or race are considered sensitive. In the certification experiments we employ a precision for the MILP solvers of and a time cutoff of seconds. We compare our training approach with two different learning methods: Fairness-Through-Unawareness (FTU), in which the sensitive features are simply removed, and SenSR [yurochkin2020training]. Exploration of the cutoff, group fairness, certification of additional NNs, scalability of the methods and additional details on the experimental settings are given in Appendix D and C.555An implementation of the method and of the experiments can be found at https://github.com/eliasbenussi/nn-cert-individual-fairness.

#### Fairness Certification

We analyse the suitability of our method in providing non-trivial certificates on --IF with respect to the similarity threshold (which we vary from to ), the similarity metric , the width of the NN (from to ), and its number of layers (from to ). These reflect the characteristics of NNs and metrics used in the IF literature [yurochkin2020training, ruoss2020learning, Urban2020PerfectlyNetworks]; for experiments on larger architectures, demonstrating the scalability of our approach, see Appendix D.3. For each dataset we train the NNs by employing the FTU approach.

The results for these analyses are plotted in Figure 2 for the Adult and the Crime datasets (results for Credit and German datasets can be found in Appendix D.1). Each heat map depicts the variation of as a function of and the NN architecture. The top row in the figure was computed by considering the Mahalanobis similarity metric; the bottom row was computed for a weighted metric (with coefficients chosen as in john2020verifying) and results for the feature embedding metrics are given in Appendix D.2. As one might expect, we observe that, across all the datasets and architectures, increasing correlates with an increase in the values for , as higher values of allow for greater feature changes. Interestingly, tends to decrease (i.e., the NN becomes more fair) as we increase the number of NN layers. This is the opposite to what is observed for the adversarial robustness, where increased capacity generally implies more fragile models [madry2017towards]. In fact, as those NNs are trained via FTU, the main sensitive features are not accessible to the NN. A possible explanation is that, as the number of layers increases, the NN’s dependency on the specific value of each feature diminishes, and the output becomes dependent on their nonlinear combination. The result suggests that over-parametrised NNs could be more adept at solving fair tasks – at least for IF definitions – though this would come with a loss of model interpretability, and exploration would be needed to assess under which condition this holds. Finally, we observe that our analysis confirms how FTU training is generally insufficient in providing fairness on the model behaviour for --IF. For each model, individuals that are dissimilar by can already yield a , meaning they would get assigned to different classes if one was using the standard classification threshold of .

#### Fairness Training

We investigate the behaviour of our fairness training algorithm for improving --IF of NNs. We compare our method with FTU and SenSR [yurochkin2020training]. For ease of comparison, in the rest of this section we measure fairness with equal to the Mahalanobis similarity metric, with , for which SenSR was developed.

The results for this analysis are given in Figure 3, where each point in the scatter plot represents the values obtained for a given NN architecture. We train architectures with up to hidden layers and units, in order to be comparable to those trained by yurochkin2020training. As expected, we observe that FTU performs the worst in terms of certified fairness, as simple omission of the sensitive features is unable to obfuscate latent dependencies between the sensitive and non-sensitive features. As previously reported in the literature, SenSR significantly improves on FTU by accounting for features latent dependencies. However, on all four datasets, our MILP-based training methodology consistently improves IF by orders of magnitude across all the architectures when compared to SenSR. In particular, for the architectures with more than one hidden layer, on average, MILP outperforms FTU by a factor of and SenSR by . Intuitively, while SenSR and our approach have a similar formulation, the former is based on gradient optimisation so that no guarantees are provided in the worst case for the training loss. In contrast, by relying on MILP, our method optimises the worst-case behaviour of the NN at each step, which further encourages training of individually fair models. The cost of the markedly improved guarantees is, of course, a higher computational costs. In fact, the training of the models in Figure 3 with MILP had an average training time of about hours. While the increased cost is significant, we highlight that this is a cost that is only paid once and may be justified in sensitive applications by the necessity of fairness at deployment time. We furthermore notice that, while our implementation is sequential, parallel per-batch solution of the MILP problems during training would markedly reduce the computational time and leave for future work the parallelisation and tensorisation of the techniques. Interestingly, we find that balanced accuracy also slightly improved with SenSR and MILP training in the tasks considered here, possibly as a result of the bias in the class labels w.r.t. sensitive features. Finally, in Figure 4 we further analyse the certified -profile w.r.t. to the input similarity , varying the value of used in for the ceritification of --IF. In the experiment, both SenSR and MILP are trained with , which means that our method, based on formal IF certificates, is guaranteed to outperform SenSR up until (as in fact is the case). Beyond , no such statement can be made, and it is still theoretically possible for SenSR to outperform MILP in particular circumstances. Empirically, however, MILP-based training still largely outperforms SenSR in terms of certified fairness obtained.

## 5 Conclusion

We introduced an anytime MILP-based method for the certification and training of --IF in NNs, based on PWL bounding and MILP encoding of non-linearities and similarity metrics. In an experimental evaluation comprising four datasets, a selection of widely employed NN architectures and three types of similarity metrics, we empirically found that our method is able to provide the first non-trivial certificates for --IF in NNs and yields NNs which are, consistently, orders of magnitude more fair than those obtained by a competitive IF training technique.

#### Acknowledgements

This project was funded by the ERC European Union’s Horizon 2020 research and innovation programme (FUN2MODEL, grant agreement No. 834115).

## References

Appendix to:

Individual Fairness Guarantees for Neural Networks

In Section A we empirically investigate the convergence of the PWL bounds w.r.t.  in the sigmoid case, and provide detailed proofs for the statements of propositions and theorem from the main paper. In Section B we discuss how the learning of the similarity metric was performed. Section C details the experimental settings used in the paper and briefly describes fairness-through-unawareness and SenSR. Finally, additional experimental results on group fairness, verification, and feature embedding metrics are given in Section D.

## Appendix A Additional Details on MILP

### a.1 Analysis of Number of Grid Points

Interestingly, by inspecting the error bounds derived in Proposition 2 we notice how the uniform error of the PWL bounds goes to zero with the product between the inverse of and the increments of the derivative of parametrised with the inverse of . In practice, this means that choosing the interval points of the grid adaptively depending on the values of yields improved rate of convergence for the bounds. In fact, in Appendix A, by choosing the grid points in inverse proportion to in practice, for , we have almost perfect overlap of the PWL with . We visualised this in Figure 1 in the main paper, where we plot the lower and upper PWL functions used in our MILP construction (the plots illustrate the explicit case of the sigmoid activation function in the interval ). The inflection point in the case of the sigmoid is in the axis origin, so it is straightforward to discretise the x-axis into convex and concave parts of the sigmoid. In particular, we achieve this by using a non-uniform discretisation of the x-axis that follows the y-axis of the plot. Empirically, we found that this provides better bounds than a uniform x-axis discretisation in the case in which (number of grid points used) is small. The figures visually show how the bounds converge as increases. Already for the maximum approximation error is of the order of , and thus this is the value we utilise in the experiments.

#### Proof of Proposition 1

Consider the -th activation function and the -th layer we want to show that everytime it follows that there exist values for and for , such that satisfies the constraints in the proposition statement. This would imply that the feasible region defined by the latter equation is larger than that defined by , and that it hence provide a safe over-approximation of it.

By using Lemma 3.1, we know that

 ζ(i)j=σ(i)(ϕ(i)j)≥η(i)j,lζPWL,(i),Lj,l−1+η(i)j,l+1ζPWL,(i),Lj,l, ζ(i)j=σ(i)(ϕ(i)j)≤η(i)j,lζPWL,(i),Uj,l−1+η(i)j,l+1ζPWL,(i),Uj,l,

where we notice that . By employing the Special Ordered Set (SOS) 2 reformulation of piecewise functions [milano2000benefits], we then obtain:

 M∑l=1y(i)j,l=1,M∑l=1η(i)j,l=1, ϕ(i)j=M∑l=1ϕ(i)Lj,lη(i)j,l,y(i)j,l≤η(i)j,l+η(i)j,l+1, M∑l=1ζPWL,(i),Lj,lη(i)j,l≤ζ(i)j M∑l=1ζPWL,(i),Uj,lη(i)j,l≥ζ(i)j

which is equivalent to the Proposition statement.

#### Proof of Proposition 2

For simplicity of notation, we drop the subscripts and superscripts from the proof, and refer to a general activation of a general hidden layer of the NN .

Without loss of generality, assume the non-linearity is convex in , with (the concave case follows specularly from the convex by opportunely considering ).

Following the construction discussed in Section 3.1, the lower bound in this case is given by the tangent through the midpoint, i.e., , where , where . We consider the lower bounding error . By definition of convexity and differentiability of we have:

 σ(c)≥σ(ϕl)+(c−ϕl)σ′(ϕl).

Hence, for the error we obtain the following chain of inequalities:

 e1(ϕ)=σ(ϕ)−σL(ϕ)= σ(ϕ)−σ(c)−(ϕ−c)σ′(c)≤ −(c−ϕ)σ′(ϕ)−(ϕ−c)σ′(c)= (ϕ−c)(σ′(ϕ)−σ′(c)).

which can be reformulated in terms of :

 e1(ϕ)≤ ϕU−ϕL2M(σ′(ϕl+1)−σ′(ϕl+1−ϕU−ϕL2M)) (5)

For the upper-bound function, we have: . Again by convexity we obtain:

 σ(ϕ)≥σ(ϕl)−σ′(ϕl)(ϕ−σ(ϕl)).

so that for the error we have the following chain of inequalities:

 e2(ϕ)=σU(ϕ)−σ(ϕ)= σ(ϕl)+(ϕ−ϕl)σ(ϕl+1)−σ(ϕl)ϕl+1−ϕl−σ(ϕ)≤ (σ(ϕl+1)−σ(ϕl)ϕl+1−ϕl+σ′(ϕl))(ϕ−ϕl).

Hence, by rewriting it in terms of , we obtain:

 e2(ϕ)≤ ⎛⎜⎝σ(ϕl+ϕU−ϕLM)−σ(ϕl)ϕU−ϕLM+σ′(ϕl)⎞⎟⎠ϕU−ϕLM. (6)

Uniform convergence as tends to infinity follows straightforwardly from the fact that Equations (5) and (6) are independent of any particular value of and that they tend to zero as goes to infinity.

#### Proof of Theorem 1

The theorem statement follows if we show that the feasible region of the MILP of Equation (4) over-approximates the feasible region of the individual fairness optimisation problem whose constraints are given in Equations (2) and (3). In fact, if this holds then any solution of the optimisation problem of Equation (4) would provide an upper bound to the solution of Problem 1, so that for any we would have that is --IF.

Fairness Constraint: For the model constraint, this follows directly from the construction of Section 3.1, so that we have that implies .

Model Constraint: We first rewrite the NN explicitly by using the notation of Equation (1) in and , so that we have , and for :

 ζ′(0)=x′,ϕ′(i)=W(i)ζ′(i−1)+b′(i),ζ′(i)=σ(i)(ϕ′(i)) ζ′′(0)=x′′,ϕ′′(i)=W(i)ζ′′(i−1)+b′′(i),ζ′′(i)=σ(i)(ϕ′′(i)).

The first two constraints in each of the two rows above are already linear constraints, and in this form appear in the MILP formulation. For the activation constraints, i.e.  for and , we proceed by computing PWL lower and upper bound functions using Lemma 3.1 and converting it into MILP form using Proposition 1. This yields the final form of the MILP we obtain.

## Appendix B Metric Learning

Recently, a line of work aimed at practical methods of learning more expressive fair distance metrics from data has been developed [ilvento19metric, mukherjee20simple, yurochkin2020training]. In this section we expand on the methodology used for metric learning in our experiments.

### b.1 Mahalanobis

For the learning of the similarity metric in the form of a Mahalanobis distance, we rely on the techniques described in yurochkin2020training that form the basis of the SenSR approach (to which we compare in our experiments). Briefly, this works as follows. Consider for simplicity the case of one sensitive feature (e.g., race) with possible categorical values. We train a softmax model to predict each value of the sensitive feature by relying on the non-sensitive features. Let denote the feature vector corresponding to only the non-sensitive features, and similarly denoting the sensitive features. We then have:

 p(xsens=k)=exp(aTkxnon-sens+bk)∑Kk=1exp(aTkxnon-sens+bk),k=1,...,K (7)

where indicates the confidence given by the softmax model to the sensitive feature having the -th value. Intuitively, the vector , for , then represents a sensitive direction in the non-sensitive features space that correlates to the -th value of . We then stack the weights of each model, defining the matrix , and compute its matrix span , which combines all the sensitive directions in defining a sensitive subspace. We finally find its orthogonal projector , which is then used to define the Mahalanobis distance metric as: .

In the case in which the sensitive feature has a continuous rather than a categorical value, the softmax model of Equation (7) can be replaced by a linear fitting model, and the remainder of the computation follows analogously. Finally, we remark that in the case in which many features are selected as sensitive, one can proceed similarly to what has been described just above, by learning a different model for each sensitive feature, and then stacking all the weights obtained together when defining the matrix .

### b.2 Weighted ℓp

For ease of comparison, we rely on the approach of john2020verifying, which in particular focuses on a weighted metric, by setting up the weights to zero for the sensitive features and to a common for all the remaining features (we remark that our method is not limited just to , but can be used for any general weighted metric). In the experiments described in Section 4 of the main paper, we consider multiple values for varying from to .

### b.3 Feature Embedding

In addition to the Mahalanobis and weighted distance metric, we also allow for the metric to be computed on an embedding. Intuitively, this allows for more flexibility in modelling the intra-relationship between the sensitive and non-sensitive features in each data point and can be used to certify individual fairness in data representations such as those discussed by [ruoss2020learning]. As a proof of concept, we do this by learning a one-layer neural network embedding of neurons, and employ the weighted metric. Results for this analysis will be given in Section D.2.

## Appendix C Experimental Setting

In this section we describe the datasets used in this paper and any preprocessing performed prior to training and certification. We then report the hyperparameter values used to train the different models used in the experiments. All experiments were run on a NVIDIA 2080Ti GPU with a 20-core Intel Core Xeon 6230.

### c.1 UCI Datasets

We consider the following UCI datasets [UCIDatasets], popular in the fairness literature, with the first three being binary classification tasks and the last one being a regression task. For all datasets we take an 80/20 train/test split, drop features with missing values, normalise continuous features and one-hot encode categorical features.

: the objective is to classify whether individuals earn more or less than \$50K/year (binary classification). Here we follow similar preprocessing steps as

yurochkin2020training. After removing native-country and education, and preprocessing, this dataset contains 40 features, it has 45,222 points, 0.24/0.76 class imbalance, and we consider sex and race to be categorical sensitive attributes.

Credit: the goal is to predict whether people will default on their payments (binary classification). After preprocessing, the dataset has 144 features, 30,000 data points, a 0.22/0.78 class imbalance, and x2 (corresponding to sex) is considered a sensitive attribute.

German: the goal is to classify individuals as good/bad credit risks (binary classification). After preprocessing, the dataset has 58 features, 1000 data points, a 0.3/0.7 class imbalance and status_sex is considered a categorical sensitive attribute.

Crime: the goal is to predict the normalised total number of violent crimes per 100K population. After preprocessing, the dataset has 97 features, 1993 data points, and racepctblack, racePctWhite, racePctAsian, racePctHisp are considered continuous sensitive attributes. The true label distribution of this dataset is very imbalanced, as shown in Figure 6.

### c.2 Hyperparameters

The hyperparameters used to train all of the FTU, SenSR and MILP models used in the experiments are reported in Table 1. The hidden layer values were selected to match the type of models trained in related literature (e.g. yurochkin2020training, Urban2020PerfectlyNetworks, ruoss2020learning). The values of learning rate, regularisation and number of epochs were selected as the result of some hyperparameter tuning, to provide accuracy results matching those found in literature.

### c.3 Training Methods

Below we describe the alternative fair training methods that are employed for comparison with our proposed training method. We note that for all methods, categorical variables are one-hot encoded, and, since MILP solvers can deal with both continuous and integer variables, no further processing is required.

#### Fairness through unawarness (FTU)

The general principle of fairness through unawareness training is that by removing the sensitive features (e.g. features containing information about gender or race) the classifier will no longer use such information to make decisions. Despite removal of the sensitive features, it is often found that these have correlations with non-sensitive features, which can lead to classifiers that are still greatly influenced by the sensitive features [pedreshi2008discrimination].

#### SenSR

SenSR is a methodology proposed by yurochkin2020training that leverages PGD to generate individually unfair adversarial examples to augment the training procedure. It supports similarity metrics in the form of a Mahalanobis distance, akin to the one we describe in Subsection B.1. We adapt their code to work on both binary classification and regression tasks to compare with our MILP method. Our MILP method bears many similarity to theirs, hence why we use it for comparison. However, while both our training methods rely on adversarial training to mitigate against unfairness, SenSR does not provide any verification methodology. Furthermore, our MILP training, while being meaningfully more computationally intensive, achieves better local optimisation thus proving upon verification to train models order of magnitude fairer than SenSR.