1 Introduction
Reservations have been raised about the application of neural networks (NN) in contexts where fairness is of concern [Barocas2018BigImpact]. Because of inherent biases present in realworld data, if unchecked, these models have been found to discriminate against individuals on the basis of sensitive features, such as race or sex [Bolukbasi2016ManEmbeddings, Angwin2016MachineBlacks]. Recently, the topic has come under the spotlight, with technologies being increasingly challenged for bias [hardesty_2018, kirk2021OutOfTheBox, the_guardian_2020], leading to the introduction of a range of definitions and techniques for capturing the multifaceted properties of fairness.
Fairness approaches are broadly categorised into: group fairness [Hardt2016EqualityLearning], which inspects the model over data demographics; and individual fairness (IF) [Dwork2012FairnessAwareness], which considers the behaviour over each individual. Despite its wider adoption, group fairness is only concerned with statistical properties of the model so that a situation may arise where predictions of a groupfair model can be perceived as unfair by a particular individual. In contrast, IF is a worstcase measure with guarantees over every possible individual in the input space. However, while techniques exist for group fairness of NNs [albarghouthi2017fairsquare, Bastani2018ProbabilisticConcentration], research on IF has thus far been limited to designing training procedures that favour fairness [yurochkin2020training, yeom2020individual, McNamara2017ProvablyRepresentations] and verification over specific individuals [ruoss2020learning]. To the best of our knowledge, there is currently no work targeted at global certification of IF for NNs.
We develop an anytime algorithm with provable bounds for the certification of IF on NNs. We build on the IF formalisation employed by john2020verifying. That is, given and a distance metric that captures the similarity between individuals, we ask that, for every pair of points and in the input space with , the NN’s output does not differ by more than . Although related to it, IF certification on NNs poses a different problem than adversarial robustness [Tjeng2019EvaluatingProgramming], as both and are here problem variables, spanning the whole space. Hence, local approximation techniques developed in the adversarial literature cannot be employed in the context of IF.
Nevertheless, we show how this global, nonlinear requirement can be encoded in MixedInteger Linear Programming (MILP) form, by deriving a set of global upper and lower piecewiselinear (PWL) bounds over each activation function in the NN over the whole input space, and performing linear encoding of the (generally nonlinear) similarity metric
. The formulation of our optimisation as a MILP allows us to compute an anytime, worstcase bound on IF, which can thus be computed using standard solvers from the global optimisation literature [dantzig2016linear]. Furthermore, we demonstrate how our approach can be embedded into the NN training so as to optimise for individual fairness at training time. We do this by performing gradient descent on a weighted loss that also accounts for the maximum variation in neighborhoods for each training point, similarly to what is done in adversarial learning [goodfellow2014explaining, Gowal2018OnModels, wicker2021bayesian].We apply our method on four benchmarks widely employed in the fairness literature, namely, the Adult, German, Credit and Crime datasets [UCIDatasets], and an array of similarity metrics learnt from data that include
, Mahalanobis, and NN embeddings. We empirically demonstrate how our method is able to provide the first, nontrivial IF certificates for NNs commonly employed for tasks from the IF literature, and even larger NNs comprising up to thousands of neurons. Furthermore, we find that our MILPbased fair training approach consistently outperforms, in terms of IF guarantees, NNs trained with a competitive stateoftheart technique by orders of magnitude, albeit at an increased computational cost.
The paper makes the following main contributions:^{2}^{2}2Proofs and additional details can be found in Appendix of an extended version of the paper available at http://www.fun2model.org/bibitem.php?key=BPW+22.

We design a MILPbased, anytime verification approach for the certification of IF as a global property on NNs.

We demonstrate how our technique can be used to modify the loss function of a NN to take into account certification of IF at training time.

On four datasets, and an array of metrics, we show how our techniques obtain nontrivial IF certificates and train NNs that are significantly fairer than stateoftheart.
Related Work
A number of works have considered IF by employing techniques from adversarial robustness. yeom2020individual rely on randomized smoothing to find the highest stable perfeature difference in a model. Their method, however, provides only (weak) guarantees on model statistics. yurochkin2020training present a method for IF training that builds on projected gradient descent and optimal transport. While the method is found to decrease model bias to stateoftheart results, no formal guarantees are obtained. ruoss2020learning adapted the MILP formulation for adversarial robustness to handle fair metric embeddings. However, rather than tackling the IF problem globally as introduced by Dwork2012FairnessAwareness, the method only works iteratively on a finite set of data, hence leaving open the possibility of unfairness in the model. In contrast, the MILP encoding we obtain through PWL bounding of activations and similarity metrics allows us to provide guarantees over any possible pair of individuals. Urban2020PerfectlyNetworks employ static analysis to certify causal fairness. While this method yields global guarantees, it cannot be straightforwardly employed for IF, and it is not anytime, making exhaustive analysis impractical. john2020verifying present a method for the computation of IF, though limited to linear and kernel models. MILP and linear relaxation have been employed to certify NNs in local adversarial settings [ehlers2017formal, Tjeng2019EvaluatingProgramming, wicker2020probabilistic]. However, local approximations cannot be employed for the global IF problem. While katz2017reluplex, leino2021globally consider global robustness, their methods are restricted to
metrics. Furthermore, they require the knowledge of a Lipschitz constant or are limited to ReLU.
2 Individual Fairness
We focus on regression and binary classification with NNs with realvalued inputs and onehot encoded categorical features.
^{3}^{3}3Multiclass can be tackled with componentwise analyses. Such frameworks are often used in automated decisionmaking, e.g. for loan applications [Hardt2016EqualityLearning]. Formally, given a compact input set and an output set , we consider an layer fullyconnected NN, parameterised by a vector of weights
trained on . For an input , and , the NN is defined as:(1) 
where . Here, is the number of units in the th layer, and are its weights and biases, is the activation function, is the preactivation and the activation. The NN output is the result of these computations, In regression,
is the prediction, while for classification it represents the class probability. In this paper we focus on fullyconnected NNs as widely employed in the IF literature
[yurochkin2020training, Urban2020PerfectlyNetworks, ruoss2020learning]. However, we should stress that our framework, being based on MILP, can be easily extended to convolutional, maxpool and batchnorm layers or resnets by using embedding techniques from the adversarial robustness literature (see e.g.
[boopathy2019cnn].Individual Fairness
Given a NN , IF [Dwork2012FairnessAwareness] enforces the property that similar individuals are similarly treated. Similarity is defined according to a taskdependent pseudometric, , provided by a domain expert (e.g., a Mahalanobis distance correlating each feature to the sensitive one), whereas similarity of treatment is expressed via the absolute difference on the NN output . We adopt the IF formulation of john2020verifying for the formalisation of inputoutput IF similarity.
Definition 1 (if [john2020verifying]).
Consider and . We say that is individually fair w.r.t. iff
Here, measures similarity between individuals and is the difference in outcomes (class probability for classification).
We emphasise that individual fairness is a global notion, as the condition in Definition 1 must hold for all pairs of points in .
We remark that the IF formulation of john2020verifying (which is more general than IF formulation typically used in the literature [yurochkin2020training, ruoss2020learning]) is a slight variation on the Lipschitz property introduced by Dwork2012FairnessAwareness.
While introducing greater flexibility thanks to its parametric form, it makes an IF parametric analysis necessary at test time.
In Section 4 we analyse how IF of NNs is affected by variations of and .
A crucial component of IF is the similarity . The intuition is that sensitive features, or their sensitive combination, should not influence the NN output. While a number of metrics has been discussed in the literature [ilvento19metric], we focus on the following representative set of metrics which can be automatically learnt from data [john2020verifying, ruoss2020learning, mukherjee20simple, yurochkin2020training]. Details on metric learning is given in Appendix B.
Weighted :
In this case is defined as a weighted version of an metric, i.e. .
Intuitively, we set the weights related to sensitive features to zero, so that two individuals are considered similar if they only differ with respect to those.
The weights for the remaining features can be tuned according to their degree of correlation to the sensitive features.
Mahalanobis:
In this case we have , for a given positive semidefinite (SPD) matrix .
The Mahalanobis distance generalises the metric by taking into account the intracorrelation of features to capture latent dependencies w.r.t. the sensitive features.
Feature Embedding:
The metric is computed on an embedding, so that , where is either the Mahalanobis or the weighted metric, and is a feature embedding map. These allow for greater modelling flexibility, at the cost of reduced interpretability.
2.1 Problem Formulation
We aim at certifying IF for NNs. To this end we formalise two problems: computing certificates and training for IF.
Problem 1 (Fairness Certification).
Given a trained NN , a similarity and a distance threshold , compute
Problem 1 provides a formulation in terms of optimisation, seeking to compute the maximum output change for any pair of input points whose distance is no more than . One can then compare with any threshold : if holds then the model has been certified to be IF.
While Problem 1 is concerned with an already trained NN, the methods we develop can also be employed to encourage IF at training time. Similarly to the approaches for adversarial learning [goodfellow2014explaining], we modify the training loss to balance between the model fit and IF.
Problem 2 (Fairness Training).
Consider an NN , a training set , a similarity metric and a distance threshold . Let be a constant. Define the IFfair loss as
where . The IF training problem is defined as finding s.t.:
3 A MILP Approach For Individual Fairness
Certification of individual fairness on a NN thus requires us to solve the following global, nonconvex optimisation problem:
subject to  (2)  
(3) 
We develop a MixedInteger Linear Programming (MILP) overapproximation (i.e., providing a sound bound) to this problem. We notice that there are two sources of nonlinearity here, one induced by the NN (Equation (2)), which we refer to as the model constraint, and the other by the fairness metric (Equation (3)), which we call fairness constraint. In the following, we show how these can be modularly bounded by piecewiselinear functions. In Section 3.3 we bring the results together to derive a MILP formulation for IF.
3.1 Model Constraint
We develop a scheme based on piecewiselinear (PWL) upper and lower bounding for overapproximating all commonly used nonlinear activation functions. An illustration of the PWL bound is given in Figure 1. Let and be lower and upper bounds on the preactivation .^{4}^{4}4Computed by bound propagation over [Gowal2018OnModels]. We proceed by building a discretisation grid over the values on grid points: , with and , such that, in each partition interval , we have that is either convex or concave. We then compute linear lower and upper bound functions for in each as follows. If is convex (resp. concave) in , then an upper (resp. lower) linear bound is given by the segment connecting the two extremum points of the interval, and a lower (resp. upper) linear bound is given by the tangent through the midpoint of the interval. We then compute the values of each linear bound in each of its grid points, and select the minimum of the lower bounds and the maximum of the upper bound values, which we store in two vectors and . The following lemma is a consequence of this construction.
Lemma 1.
Let . Denote with the index associated to the partition of in which falls and consider such that . Then:
that is, and define continuous PWL lower and upper bounds for in .
Lemma 3.1 guarantees that we can bound the nonlinear activation functions using PWL functions. Crucially, PWL functions can then be encoded into the MILP constraints.
Proposition 1.
A proof can be found in Appendix A. Proposition 1 ensures that the global behaviour of each NN neuron can be overapproximated by linear constraints using auxiliary variables. Employing Proposition 1 we can encode the model constraint of Equation (2) into the MILP form in a sound way.
The overapproximation error does not depend on the MILP formulation (which is exact), but on the PWL bounding, and is hence controllable through the selection of the number of grid points , and becomes exact in the limit. Notice that in the particular case of ReLU activation functions the overapproximation is exact for any
Proposition 2.
Assume to be continuously differentiable everywhere in , except possibly in a finite set. Then PWL lower and upper bounding functions of Lemma 3.1 converge uniformly to as goes to infinity.
Furthermore, define , then for finite values of the error on the lower (resp. upper) bounding in convex (resp. concave) regions of for is given by:
and upper (resp. lower) in concave (resp. convex) regions:
A proof of Proposition 2 is given in Appendix A, alongside an experimental analysis of the convergence rate.
We remark that the PWL bound can be used over all commonly employed activation functions . The only assumption made is that has a finite number of inflection points over any compact interval of . For convergence (Prop. 2) we require continuous differentiability almost everywhere, which is satisfied by commonly used activations.
3.2 Fairness Constraint
The encoding of the fairness constraint within the MILP formulation depends on the specific form of the metric .
Weighted Metric: The weighted metric can be tackled by employing rectangular approximation regions. While this is straightforward for the metric, for the remaining cases interval abstraction can be used [dantzig2016linear].
Mahalanobis Metric: We first compute an orthogonal decomposition of as in , where
is the eigenvector matrix of
and is a diagonal matrix with eigenvalues as entries. Consider the rotated variables and , then we have that Equation (3) can be rewritten as . By simple algebra we thus have that, for each , . By transforming back to the original variables, we obtain that Equation (3) can be overapproximated by:Feature Embedding Metric We tackle the case in which used in the metric definition, i.e. , is a NN embedding. This is straightforward as can be encoded into MILP as for the model constraint.
3.3 Overall Formulation
We now formulate the MILP encoding for the overapproximation of IF. For Equation (2), we proceed by deriving a set of approximating constraints for the variables and by using the techniques described in Section 3.1. We denote the corresponding variables as , and , , respectively. The NN final output on and on will then respectively be and , so that . Finally, we overapproximate Equation (3) as described in Section 3.2. In the case of Mahalanobis distance, we thus obtain:
(4)  
subject  
Though similar, the above MILP is significantly different from those used for adversarial robustness (see e.g. Tjeng2019EvaluatingProgramming). First, rather than looking for perturbations around a fixed a point, here we have both and as variables. Furthermore, rather than being local, the MILP problem for IF is global, over the whole input space . As such, local approximations of nonlinearities cannot be used, as the bounding needs to be valid simultaneously over the whole input space. Finally, while in adversarial robustness one can ignore the last sigmoid layer, for IF, because of the two optimisation variables, one cannot simply map from the last preactivation value to the class probability, so that even for ReLU NNs one needs to employ bounding of nonpiecewise activations for the final sigmoid.
By combining the results from this section, we have:
Theorem 1.
Consider , a similarity and a NN . Let and be the optimal points for the optimisation problem in Equation (4). Define . Then is individually fair w.r.t. for any .
Theorem 1, whose proof can be found in Appendix A
, states that a solution of the MILP problem provides us with a sound estimation of individual fairness of an NN. Crucially, it can be shown that branchandbound techniques for the solution of MILP problems converge in finite time to the optimal solution
[del2012convergence], while furthermore providing us with upper and lower bounds for the optimal value at each iteration step. Therefore, we have:Corollary 1.
Let and lower and upper bounds computed by a MILP solver at step . Then we have that: . Furthermore, given a precision, , there exist a finite such that .
That is, our method is sound and anytime, as at each iteration step in the MILP solving we can retrieve a lower and an upper bound on , which can thus be used to provide provable guarantees while converging to in finite time.
Complexity Analysis
The encoding of the model constraint can be done in , where is the maximum width of , is the number of layers, and is the number of grid points used for the PWL bound. The computational complexity of the fairness constraints depends on the similarity metric employed. While for no processing needs to be done, the computational complexity is for the Mahalanobis distance and again for the feature embedding metric. Each iteration of the MILP solver entails the solution of a linear programming problem and is hence . Finite time convergence of the MILP solver to with precision is exponential in the number of problem variables, in and .
3.4 Fairness Training for Neural Networks
The IF MILP formulation introduced in Section 3 can be adapted for the solution of Problem 2. The key step is the computation of in the second component of the modified loss introduced in Problem 2, which is used to introduce fairness directly into the loss of the neural network. This computation can be done by observing that, for every training point drawn from , the computation of is a particular case of the formulation described in Section 3, where, instead of having two variable input points, only one input point is a problem variable while the other is given and drawn from the training dataset . Therefore, can be computed by solving the MILP problem, where we fix a set of the problem variables to , and can be subsequently used to obtain the value of the modified loss function. Note that these constraints are not cumulative, since they are built for each minibatch, and discarded after optimization is solved to update the weights.
We summarise our fairness training method in Algorithm 1. For each batch in each of the training epochs, we perform a forward pass of the NN to obtain the output, (line 5). We then formulate the MILP problem as in Section 3 (line 6), and initialise an empty set variable to collect the solutions to the various subproblems (line 7). Then, for each training point in the minibatch, we fix the MILP constraints to the variables associated with (line 9), solve the resulting MILP for , and place in the set that collects the solutions, i.e. . Finally, we compute the NN predictions on (line 13); the result is used to compute the modified loss function (line 14) and the weights are updated by taking a step of gradient descent. The resulting set of weights balances the empirical accuracy and fairness around the training points.
The choice of affects the relative importance of standard training w.r.t. the fairness constraint: is equivalent to standard training, while only optimises for fairness. In our experiments we keep for half of the training epochs, and then change it to .
4 Experiments
In this section, we empirically validate the effectiveness of our MILP formulation for computing IF guarantees as well as for fairness training of NNs. We perform our experiments on four UCI datasets [UCIDatasets]: the Adult dataset (predicting income), the Credit dataset (predicting payment defaults), the German dataset (predicting credit risk) and the Crime dataset (predicting violent crime). In each case, features encoding information regarding gender or race are considered sensitive. In the certification experiments we employ a precision for the MILP solvers of and a time cutoff of seconds. We compare our training approach with two different learning methods: FairnessThroughUnawareness (FTU), in which the sensitive features are simply removed, and SenSR [yurochkin2020training]. Exploration of the cutoff, group fairness, certification of additional NNs, scalability of the methods and additional details on the experimental settings are given in Appendix D and C.^{5}^{5}5An implementation of the method and of the experiments can be found at https://github.com/eliasbenussi/nncertindividualfairness.
Fairness Certification
We analyse the suitability of our method in providing nontrivial certificates on IF with respect to the similarity threshold (which we vary from to ), the similarity metric , the width of the NN (from to ), and its number of layers (from to ). These reflect the characteristics of NNs and metrics used in the IF literature [yurochkin2020training, ruoss2020learning, Urban2020PerfectlyNetworks]; for experiments on larger architectures, demonstrating the scalability of our approach, see Appendix D.3. For each dataset we train the NNs by employing the FTU approach.
The results for these analyses are plotted in Figure 2 for the Adult and the Crime datasets (results for Credit and German datasets can be found in Appendix D.1). Each heat map depicts the variation of as a function of and the NN architecture. The top row in the figure was computed by considering the Mahalanobis similarity metric; the bottom row was computed for a weighted metric (with coefficients chosen as in john2020verifying) and results for the feature embedding metrics are given in Appendix D.2. As one might expect, we observe that, across all the datasets and architectures, increasing correlates with an increase in the values for , as higher values of allow for greater feature changes. Interestingly, tends to decrease (i.e., the NN becomes more fair) as we increase the number of NN layers. This is the opposite to what is observed for the adversarial robustness, where increased capacity generally implies more fragile models [madry2017towards]. In fact, as those NNs are trained via FTU, the main sensitive features are not accessible to the NN. A possible explanation is that, as the number of layers increases, the NN’s dependency on the specific value of each feature diminishes, and the output becomes dependent on their nonlinear combination. The result suggests that overparametrised NNs could be more adept at solving fair tasks – at least for IF definitions – though this would come with a loss of model interpretability, and exploration would be needed to assess under which condition this holds. Finally, we observe that our analysis confirms how FTU training is generally insufficient in providing fairness on the model behaviour for IF. For each model, individuals that are dissimilar by can already yield a , meaning they would get assigned to different classes if one was using the standard classification threshold of .
Fairness Training
We investigate the behaviour of our fairness training algorithm for improving IF of NNs. We compare our method with FTU and SenSR [yurochkin2020training]. For ease of comparison, in the rest of this section we measure fairness with equal to the Mahalanobis similarity metric, with , for which SenSR was developed.
The results for this analysis are given in Figure 3, where each point in the scatter plot represents the values obtained for a given NN architecture. We train architectures with up to hidden layers and units, in order to be comparable to those trained by yurochkin2020training. As expected, we observe that FTU performs the worst in terms of certified fairness, as simple omission of the sensitive features is unable to obfuscate latent dependencies between the sensitive and nonsensitive features. As previously reported in the literature, SenSR significantly improves on FTU by accounting for features latent dependencies. However, on all four datasets, our MILPbased training methodology consistently improves IF by orders of magnitude across all the architectures when compared to SenSR. In particular, for the architectures with more than one hidden layer, on average, MILP outperforms FTU by a factor of and SenSR by . Intuitively, while SenSR and our approach have a similar formulation, the former is based on gradient optimisation so that no guarantees are provided in the worst case for the training loss. In contrast, by relying on MILP, our method optimises the worstcase behaviour of the NN at each step, which further encourages training of individually fair models. The cost of the markedly improved guarantees is, of course, a higher computational costs. In fact, the training of the models in Figure 3 with MILP had an average training time of about hours. While the increased cost is significant, we highlight that this is a cost that is only paid once and may be justified in sensitive applications by the necessity of fairness at deployment time. We furthermore notice that, while our implementation is sequential, parallel perbatch solution of the MILP problems during training would markedly reduce the computational time and leave for future work the parallelisation and tensorisation of the techniques. Interestingly, we find that balanced accuracy also slightly improved with SenSR and MILP training in the tasks considered here, possibly as a result of the bias in the class labels w.r.t. sensitive features. Finally, in Figure 4 we further analyse the certified profile w.r.t. to the input similarity , varying the value of used in for the ceritification of IF. In the experiment, both SenSR and MILP are trained with , which means that our method, based on formal IF certificates, is guaranteed to outperform SenSR up until (as in fact is the case). Beyond , no such statement can be made, and it is still theoretically possible for SenSR to outperform MILP in particular circumstances. Empirically, however, MILPbased training still largely outperforms SenSR in terms of certified fairness obtained.
5 Conclusion
We introduced an anytime MILPbased method for the certification and training of IF in NNs, based on PWL bounding and MILP encoding of nonlinearities and similarity metrics. In an experimental evaluation comprising four datasets, a selection of widely employed NN architectures and three types of similarity metrics, we empirically found that our method is able to provide the first nontrivial certificates for IF in NNs and yields NNs which are, consistently, orders of magnitude more fair than those obtained by a competitive IF training technique.
Acknowledgements
This project was funded by the ERC European Union’s Horizon 2020 research and innovation programme (FUN2MODEL, grant agreement No. 834115).
References
Appendix to:
Individual Fairness Guarantees for Neural Networks
In Section A we empirically investigate the convergence of the PWL bounds w.r.t. in the sigmoid case, and provide detailed proofs for the statements of propositions and theorem from the main paper. In Section B we discuss how the learning of the similarity metric was performed. Section C details the experimental settings used in the paper and briefly describes fairnessthroughunawareness and SenSR. Finally, additional experimental results on group fairness, verification, and feature embedding metrics are given in Section D.
References
Appendix A Additional Details on MILP
a.1 Analysis of Number of Grid Points
Interestingly, by inspecting the error bounds derived in Proposition 2 we notice how the uniform error of the PWL bounds goes to zero with the product between the inverse of and the increments of the derivative of parametrised with the inverse of . In practice, this means that choosing the interval points of the grid adaptively depending on the values of yields improved rate of convergence for the bounds. In fact, in Appendix A, by choosing the grid points in inverse proportion to in practice, for , we have almost perfect overlap of the PWL with . We visualised this in Figure 1 in the main paper, where we plot the lower and upper PWL functions used in our MILP construction (the plots illustrate the explicit case of the sigmoid activation function in the interval ). The inflection point in the case of the sigmoid is in the axis origin, so it is straightforward to discretise the xaxis into convex and concave parts of the sigmoid. In particular, we achieve this by using a nonuniform discretisation of the xaxis that follows the yaxis of the plot. Empirically, we found that this provides better bounds than a uniform xaxis discretisation in the case in which (number of grid points used) is small. The figures visually show how the bounds converge as increases. Already for the maximum approximation error is of the order of , and thus this is the value we utilise in the experiments.
Proof of Proposition 1
Consider the th activation function and the th layer we want to show that everytime it follows that there exist values for and for , such that satisfies the constraints in the proposition statement. This would imply that the feasible region defined by the latter equation is larger than that defined by , and that it hence provide a safe overapproximation of it.
By using Lemma 3.1, we know that
where we notice that . By employing the Special Ordered Set (SOS) 2 reformulation of piecewise functions [milano2000benefits], we then obtain:
which is equivalent to the Proposition statement.
Model  Learning Rate  Regularization  Epochs  Hidden Layers  

FTU  0.025  0.0125  35  
Adult  SenSR  0.001  0.05  400  [8], [16], [24], [64], [8,8], [16,16] 
MILP  0.001  0.05  400  
FTU  0.002  0.02  50  
Credit  SenSR  0.0025  0.04  100  [8], [16], [24], [64], [8,8], [16,16] 
MILP  0.0025  0.04  100  
FTU  0.001  0.02  35  
German  SenSR  0.0025  0.04  250  [8], [16], [24], [64], [8,8], [16,16] 
MILP  0.0025  0.04  250  
FTU  0.001  0.02  35  
Crime  SenSR  0.025  0.025  100  [8], [12], [16], [24], [8,8], [16,16] 
MILP  0.025  0.025  100 
Proof of Proposition 2
For simplicity of notation, we drop the subscripts and superscripts from the proof, and refer to a general activation of a general hidden layer of the NN .
Without loss of generality, assume the nonlinearity is convex in , with (the concave case follows specularly from the convex by opportunely considering ).
Following the construction discussed in Section 3.1, the lower bound in this case is given by the tangent through the midpoint, i.e., , where , where . We consider the lower bounding error . By definition of convexity and differentiability of we have:
Hence, for the error we obtain the following chain of inequalities:
which can be reformulated in terms of :
(5) 
For the upperbound function, we have: . Again by convexity we obtain:
so that for the error we have the following chain of inequalities:
Hence, by rewriting it in terms of , we obtain:
(6) 
Proof of Theorem 1
The theorem statement follows if we show that the feasible region of the MILP of Equation (4) overapproximates the feasible region of the individual fairness optimisation problem whose constraints are given in Equations (2) and (3). In fact, if this holds then any solution of the optimisation problem of Equation (4) would provide an upper bound to the solution of Problem 1, so that for any we would have that is IF.
Fairness Constraint: For the model constraint, this follows directly from the construction of Section 3.1, so that we have that implies .
Model Constraint: We first rewrite the NN explicitly by using the notation of Equation (1) in and , so that we have , and for :
The first two constraints in each of the two rows above are already linear constraints, and in this form appear in the MILP formulation. For the activation constraints, i.e. for and , we proceed by computing PWL lower and upper bound functions using Lemma 3.1 and converting it into MILP form using Proposition 1. This yields the final form of the MILP we obtain.
Appendix B Metric Learning
Recently, a line of work aimed at practical methods of learning more expressive fair distance metrics from data has been developed [ilvento19metric, mukherjee20simple, yurochkin2020training]. In this section we expand on the methodology used for metric learning in our experiments.
b.1 Mahalanobis
For the learning of the similarity metric in the form of a Mahalanobis distance, we rely on the techniques described in yurochkin2020training that form the basis of the SenSR approach (to which we compare in our experiments). Briefly, this works as follows. Consider for simplicity the case of one sensitive feature (e.g., race) with possible categorical values. We train a softmax model to predict each value of the sensitive feature by relying on the nonsensitive features. Let denote the feature vector corresponding to only the nonsensitive features, and similarly denoting the sensitive features. We then have:
(7) 
where indicates the confidence given by the softmax model to the sensitive feature having the th value. Intuitively, the vector , for , then represents a sensitive direction in the nonsensitive features space that correlates to the th value of . We then stack the weights of each model, defining the matrix , and compute its matrix span , which combines all the sensitive directions in defining a sensitive subspace. We finally find its orthogonal projector , which is then used to define the Mahalanobis distance metric as: .
In the case in which the sensitive feature has a continuous rather than a categorical value, the softmax model of Equation (7) can be replaced by a linear fitting model, and the remainder of the computation follows analogously. Finally, we remark that in the case in which many features are selected as sensitive, one can proceed similarly to what has been described just above, by learning a different model for each sensitive feature, and then stacking all the weights obtained together when defining the matrix .
b.2 Weighted
For ease of comparison, we rely on the approach of john2020verifying, which in particular focuses on a weighted metric, by setting up the weights to zero for the sensitive features and to a common for all the remaining features (we remark that our method is not limited just to , but can be used for any general weighted metric). In the experiments described in Section 4 of the main paper, we consider multiple values for varying from to .
b.3 Feature Embedding
In addition to the Mahalanobis and weighted distance metric, we also allow for the metric to be computed on an embedding. Intuitively, this allows for more flexibility in modelling the intrarelationship between the sensitive and nonsensitive features in each data point and can be used to certify individual fairness in data representations such as those discussed by [ruoss2020learning]. As a proof of concept, we do this by learning a onelayer neural network embedding of neurons, and employ the weighted metric. Results for this analysis will be given in Section D.2.
Appendix C Experimental Setting
In this section we describe the datasets used in this paper and any preprocessing performed prior to training and certification. We then report the hyperparameter values used to train the different models used in the experiments. All experiments were run on a NVIDIA 2080Ti GPU with a 20core Intel Core Xeon 6230.
c.1 UCI Datasets
We consider the following UCI datasets [UCIDatasets], popular in the fairness literature, with the first three being binary classification tasks and the last one being a regression task. For all datasets we take an 80/20 train/test split, drop features with missing values, normalise continuous features and onehot encode categorical features.
Adult
: the objective is to classify whether individuals earn more or less than $50K/year (binary classification). Here we follow similar preprocessing steps as
yurochkin2020training. After removing nativecountry and education, and preprocessing, this dataset contains 40 features, it has 45,222 points, 0.24/0.76 class imbalance, and we consider sex and race to be categorical sensitive attributes.Credit: the goal is to predict whether people will default on their payments (binary classification). After preprocessing, the dataset has 144 features, 30,000 data points, a 0.22/0.78 class imbalance, and x2 (corresponding to sex) is considered a sensitive attribute.
German: the goal is to classify individuals as good/bad credit risks (binary classification). After preprocessing, the dataset has 58 features, 1000 data points, a 0.3/0.7 class imbalance and status_sex is considered a categorical sensitive attribute.
Crime: the goal is to predict the normalised total number of violent crimes per 100K population. After preprocessing, the dataset has 97 features, 1993 data points, and racepctblack, racePctWhite, racePctAsian, racePctHisp are considered continuous sensitive attributes. The true label distribution of this dataset is very imbalanced, as shown in Figure 6.
c.2 Hyperparameters
The hyperparameters used to train all of the FTU, SenSR and MILP models used in the experiments are reported in Table 1. The hidden layer values were selected to match the type of models trained in related literature (e.g. yurochkin2020training, Urban2020PerfectlyNetworks, ruoss2020learning). The values of learning rate, regularisation and number of epochs were selected as the result of some hyperparameter tuning, to provide accuracy results matching those found in literature.
c.3 Training Methods
Below we describe the alternative fair training methods that are employed for comparison with our proposed training method. We note that for all methods, categorical variables are onehot encoded, and, since MILP solvers can deal with both continuous and integer variables, no further processing is required.
Fairness through unawarness (FTU)
The general principle of fairness through unawareness training is that by removing the sensitive features (e.g. features containing information about gender or race) the classifier will no longer use such information to make decisions. Despite removal of the sensitive features, it is often found that these have correlations with nonsensitive features, which can lead to classifiers that are still greatly influenced by the sensitive features [pedreshi2008discrimination].
SenSR
SenSR is a methodology proposed by yurochkin2020training that leverages PGD to generate individually unfair adversarial examples to augment the training procedure. It supports similarity metrics in the form of a Mahalanobis distance, akin to the one we describe in Subsection B.1. We adapt their code to work on both binary classification and regression tasks to compare with our MILP method. Our MILP method bears many similarity to theirs, hence why we use it for comparison. However, while both our training methods rely on adversarial training to mitigate against unfairness, SenSR does not provide any verification methodology. Furthermore, our MILP training, while being meaningfully more computationally intensive, achieves better local optimisation thus proving upon verification to train models order of magnitude fairer than SenSR.