Fairness by Explicability and Adversarial SHAP Learning

The ability to understand and trust the fairness of model predictions, particularly when considering the outcomes of unprivileged groups, is critical to the deployment and adoption of machine learning systems. SHAP values provide a unified framework for interpreting model predictions and feature attribution but do not address the problem of fairness directly. In this work, we propose a new definition of fairness that emphasises the role of an external auditor and model explicability. To satisfy this definition, we develop a framework for mitigating model bias using regularizations constructed from the SHAP values of an adversarial surrogate model. We focus on the binary classification task with a single unprivileged group and link our fairness explicability constraints to classical statistical fairness metrics. We demonstrate our approaches using gradient and adaptive boosting on: a synthetic dataset, the UCI Adult (Census) Dataset and a real-world credit scoring dataset. The models produced were fairer and performant.

READ FULL TEXT VIEW PDF

Authors

page 1

page 2

page 3

page 4

02/25/2020

Counterfactual fairness: removing direct effects through regularization

Building machine learning models that are fair with respect to an unpriv...
10/12/2018

Interpretable Fairness via Target Labels in Gaussian Process Models

Addressing fairness in machine learning models has recently attracted a ...
05/14/2020

Statistical Equity: A Fairness Classification Objective

Machine learning systems have been shown to propagate the societal error...
05/26/2022

Flexible Group Fairness Metrics for Survival Analysis

Algorithmic fairness is an increasingly important field concerned with d...
05/10/2021

Improving Fairness of AI Systems with Lossless De-biasing

In today's society, AI systems are increasingly used to make critical de...
11/03/2020

Insights into Fairness through Trust: Multi-scale Trust Quantification for Financial Deep Learning

The success of deep learning in recent years have led to a significant i...
10/25/2021

Fair Enough: Searching for Sufficient Measures of Fairness

Testing machine learning software for ethical bias has become a pressing...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

The last few decades have seen machine learning algorithms become even more performant and leverage larger varieties of data. These advances have led to wide-spread adoption of machine learning in nearly every industry. The potential damage and wider societal harm that could be caused by large-scale automated decisioning systems is palpable amongst regulators, industry practitioners and consumers [Recidivism2017, WeaponsMathDestruction2016, PredictAndServe2016]. Two specific concerns that have emerged center on the interpretability and fairness of the decisions resulting from these algorithms. These are not unjustified with cases of unfair decisioning systems manifesting in multiple domains from criminal recidivism [Recidivism2017] to credit worthiness assessment. In the European Union, these concerns have manifest in the General Data Protection Regulation  [GDPR, GDPRML2017] that enshrines each individual’s right to fair and transparent processing. This combined societal and legislative scrutiny has resulted in model interpretability and algorithmic fairness coming to the fore in research [Dwork2012, FairnessSurvey2019].

At the broadest level, the concept of algorithmic fairness tackles whether members of specific unprivileged groups are more likely to receive unfavourable decisions from the predictions of a machine learning system. Recent advances have enabled modellers to incorporate fairness at every point of the model building process [FriedlerComparison, FairnessSurvey2019, Framework2019, DesigningFairAlgos2018]. One embodiment incorporates fairness constraints into the training procedure [PredRemover2012, MetaLearn2019, FairBoost2020, Goel2018, IBM2018, Nabi2018, FNN2018, FairRed2018, FNN2018], typically these constraints rely on statistical measures of fairness and are subject to drawbacks [FairnessImpossibility2018] and trade-offs. These measures rely on a priori worldviews and do not incorporate the role of external model auditing or decision explicability in their fairness criteria. This is poorly aligned with how these issues are dealt with in industry, where external actors often question the model fairness through building surrogate explanatory models, even if mentally, using the information available to them.

To address these issues, we propose a new definition of fairness we dub “Fairness by Explicability”. Under this definition, if an external actor’s surrogate model cannot produce a narrative (i.e., a set of explanations) against the fairness of a particular model, then that particular model can be considered explicably fair. This definition explicitly frames the perception of an algorithm’s fairness as one determined by a combination of an auditor’s worldview, data availability, model interpretability framework and measurement/modelling approach. It can be considered complementary to the existing ways of evaluating a model’s fairness, since while those may capture risk arising from non-adherence to regulatory requirements, our new “fairness by explicability” viewpoint captures the additional and independent risk that may arise from analyses performed by one’s own clients [times].

To enforce our “Fairness by Explicability” definition, we leverage model interpretability methodologies [LIME2016, SHAP2017, Hastie2019] to incorporate fairness constraints through adversarial learning. More explicitly, we utilize the SHAP [SHAP2017, TreeSHAP2019] values of a surrogate adversary model in two ways. The first works by constructing a differentiable fairness regularization term. The second is a modification to the classic AdaBoost algorithm [AdaBoost1997] to include adversarial attribution values in the weight updates.

We link our fairness approach to statistical fairness [Metrics2018] via the construction of an appropriate surrogate model. Our approaches are illustrated using a synthetic dataset, the UCI Adult Census Dataset [uci], and a commercial credit scoring dataset111Private and internal to Experian. These datasets present a diverse evaluation set, with the real-world dataset providing assurance that these approaches are viable in industrial applications. The structure of the papers is as follows: in Section 2 we introduce our notation; in Section 3 we provide a brief account of SHAP values and Section 4 discusses statistical fairness measures. Section 5 introduces the “Fairness by Explicability” worldview and in Section 6 we present our SHAP-regularized algorithms before discussing the results of the experiments in Section 7. We then state our conclusions and highlight areas of further research in Section 8.

2 Notation

To measure the fairness of any algorithm output one needs to define the task objective, the un-/privileged groups to measure fairness against and the favourable outcomes. For the remainder of this paper, we focus on binary classification tasks with a single privileged group indicator . We denote the other covariates present with and the combination of with those covariates by . Furthermore, and without loss of generality, we define the value of for the target and the corresponding model outcomes as the favourable label. Model outcomes are constructed by applying a threshold to the scores . For each instance , we denote the corresponding values with the appropriate lowercase symbol and subscript, i.e. , , , etc. In this case, and

denote vectors and the value of the

th covariate is given by and .

3 SHapley Additive Explanations (SHAP)

SHapley Additive Explanations, or SHAP values [SHAP2017, TreeSHAP2019], provide a unified framework for interpreting model predictions. This approach was built off the insight that many other modern explanatory frameworks such as LIME [LIME2016] and DeepLIFT [DeepLIFT2017] could be recast as variants of a generic additive feature attribution paradigm. In this paradigm, a simplified explanatory model is built to explain the original prediction using simplified binary input vectors , where is the number of features and is the instance label. These simplified inputs are related to the original feature vectors through the mapping and the local explanatory model is given by:

(1)

The local feature effect of feature for model is and global explanations are calculated via the statistics of these values across a dataset. The different explanatory frameworks, e. g. LIME, emerge from specific choices of the mapping function , the kernel weighting of instances in the objective () and any additional regularization terms used to fit . These choices influence the properties of the surrogate model. In Ref. [SHAP2017], they showed that only one satisfies these 3 desirable properties:

  1. Local Accuracy: , when .

  2. Missingness: .

  3. Attribution Consistency: for any two models , the ordering of the differences of the model output when a feature is present vs missing is reflected in their respective attributions of that feature.

Its attributions

are the same Shapley Values first identified in cooperative game theory 

[NPersonGame1953, RegressionSHAPLEY2001, SHAPUnique1985, LinearSHAP2014]:

(2)

Here, is the number of non-zero entries in , denotes setting the th element of to and the summation is over all where the non-zero entries are a subset of the non-zero entries of

. These SHAP values can be estimated for a generic model using KernelSHAP 

[SHAP2017] while for specific model families there are efficient computational methods and analytic approximations [TreeSHAP2019, LinearSHAP2014].

4 Metrics and Statistical Fairness

To estimate fairness metrics one requires a dataset of instances with and as well as the outcomes. Given this data, the appropriate fairness metric is often defined by the worldview(s) [Worldviews2019] of those auditing the outcomes. These worldviews tend to fall into three broad categories: “We’re all equal” [barocas2016big], “What you see is what you get” [Dwork2012, roemer2015equality] and causal [Kilbertus2017, ZhangDirectEffect, Kusner2017, CounterfactualFairGDPR2017, FairBoost2020, Chiappa2018]. The first two categories are statistical in nature and we now discuss their application to the binary task domain.

Statistical fairness metrics relate to the conditional probabilities involving

, and . The “We’re all equal” worldview has numerous group fairness metrics associated with it. These metrics measure any differences in outcome given group membership and seek to balance said outcomes. Contrastingly, “What you see is what you get” asserts that the observed data captures the underlying “truth” and typically prefers to offer individuals similar outcomes conditional on . In this work, we consider two of the most common statistical fairness metrics from these categories: “statistical parity” difference (SPD) and “equality of opportunity” difference (EOD). More formally, these are defined as:

(3)
(4)

Note that a target SPD value can also be calculated by replacing with respectively in Eq. 3. Both of these measures are estimated from a specified dataset, their value of zero denotes a maximally fair model, and both have trade-offs [FairnessImpossibility2018] and limitations. For example, SPD can be minimized through randomly modifying outcomes while ignoring all other covariates and so can be viewed as a lazy penalization. Contrastingly, minimizing EOD may not reduce any gap in the rate of favourable outcomes between the groups.

5 Fairness by Explicability

The traditional statistical fairness metrics presented in Section 4 are not explicitly linked to the domain of model interpretability, let alone interpretability frameworks such as SHAP [SHAP2017] or LIME [LIME2016]. These measures emerge from the worldviews of individuals auditing the model outcomes for fairness. Typically, when trying to understand observations, a human agent (an external actor/auditor) will construct a surrogate model to obtain explanations for their observations. The role of in these explanations determines whether the outcomes constructed are perceived as fair or not. Building on this idea, we propose a new worldview to capture the mechanism by which model decisions are evaluated by external actors.

Definition 1

Consider a model trained by an auditor to predict using and, optionally, a combination from . If this model does not detect any difference in the attribution between the groups, then the predictor model is explicably fair with respect to the auditor.

We dub this worldview “Fairness by Explicability”. The precise measure of fairness one attains is determined by: the population examined by the auditor, the interpretability framework used, how attributions are calculated and aggregated, and the auditor model developed. This definition can be specialized into a strong “Fairness by Explicability” form by further requiring that total attribution for is also reduced to zero.

Auditors are usually interested in the average attribution of the two groups given a population of data. This informs the metrics used to quantify how “explicably fair” a model is. These are:

(5)
(6)

where is the SHAP value of for instance for auditor model , is the total number of instances in the dataset and is number of examples when . measures the difference in mean attribution between the two groups. When it is minimized the model is considered fair according to our “Fairness by Explicability” definition. The second metric () measures the total attribution of across the population, when minimized the auditor model concludes that the model satisfies the strong version of “Fairness by Explicability” and, by definition, the first metric is also zero.

From this discussion, “Fairness by Explicability” may appear intuitive but difficult to implement and, in general, being “explicably fair” does not provide any guarantees of statistical fairness. However, an initial informal connection to the prior fairness worldviews can be made through consideration of specific form of the auditor models. Intuitively, removing the dependency on as measured by an external will tend to reduce ’s dependency on . This will generally lead to improved SPD and EOD, although the decision policy plays a large role in how these two connect.

6 Achieving Fairness by Explicability

We now present two different approaches for imposing “Fairness by Explicability” directly into the training process of gradient-based and adaptive boosting (specifically AdaBoost) algorithms. These approaches rely on inserting a surrogate model directly into the iterative training procedures. The form of is then chosen to account for the examination of an anticipated external auditor whose model is . Both approaches require during the training phase only, hence any sensitive attributes defining do not need to be supplied at prediction time. In addition to this presentation, we also discuss how the approaches can be linked to the SPD and EOD.

6.1 SHAPSqueeze

The first approach to imposing “Fairness by Explicability” uses a series of differentiable regularizations to penalize unfair attributions. We consider a differentiable loss function of the form:

(7)

which we can optimize through gradient-based methods, e.g. stochastic gradient descent. At each iteration, a surrogate model

is fit to the values. From , the SHAP values of , and optionally , are used to calculate the appropriate regularization term (). In this work, is the binary cross-entropy. Considering the case where and are identical, when the associated is minimized then the attributions to will be zero and strong “Fairness by Explicability” is satisfied by the model scores .

The specific form of

we examine is a linear regression model, see the first row of Table 

1. The SHAP values of interest are given by:

(8)

Equation (8) directly relates the SHAP values of to its model coefficient, , and the specific realisation of for instance . The regularization is then simply the sum of the squares of these SHAP values scaled by a constant , see Table 1. This constant is used to make the size of the gradients coming from and comparable, while is used to adjust the balance between these two quantities. Moreover, we note that the explicability fairness metrics in Eq. 5 are proportional to in this instance. Therefore, these specific and will seek to eliminate the linear dependence of the model predictions on . Consequently, we expect reductions in the SPD as the model becomes explicably fairer.

To conclude, we note that the use of linear regression makes both the model fitting and SHAP value derivative calculations computationally efficient to perform. However, the approach described is applicable to any whose SHAP values are differentiable with respect to and so parametric/kernel regression models could also be employed. In combination with adding more features, this can allow for the consideration of more complex auditors with different worldviews.

algorithm surrogate - regularization
SHAPSqueeze
SHAPEnforce
Table 1: The surrogate models and regularizations considered in this work.

6.2 SHAPEnforce

The classic AdaBoost algorithm [AdaBoost1997]

trains a model that is a weighted linear combination of weak classifiers. The training process is iterative, with each weak learner (

) being fitted to a reweighted version of the training data. After iterations, the outputted model is given by . We consider learners that output a score and whose classification output, , is obtained by thresholding. Traditionally, AdaBoost generates the instance weights for the th training round, , by scaling the previous iteration’s weights . Instances that are incorrectly classified have their weights enhanced by , while correctly classified instances are downweighted by . As training proceeds, the algorithm increasingly focuses on erroneous examples to improve its predictive performance. To incorporate “Fairness by Explicability” into AdaBoost, we adjust its reweighting process to consider the SHAP values , of the features of a surrogate . This SHAP weighting is introduced through a penalty function () and fairness regularization weight () which trades off the original weight update with the new penalty.

In effect, this forces weak learners to not only focus on erroneous examples but also those with specific SHAP values as determined by and . This pushes the algorithm to improve its predictions on instances with specific SHAP values and is dubbed “SHAPEnforce”. Furthermore, in contrast to SHAPSqueeze, it is fully non-parametric and only requires that the SHAP values of can be computed.

0:  training examples , specification of favourable outcome, a surrogate model , a SHAP penalty function , and the number of boosting rounds .
1:  INITIALIZE weights .
2:  for m=1,…,R do
3:     Fit a weak learner using the training data with weights .
4:     Compute the probability of favourable outcome, , and the predicted label from .
5:     Fit - taking features and the target from .
6:     Compute the , and the corresponding weight adjustment .
7:     Compute .
8:     Compute .
9:     Update the instance weights: .
10:     Set .
11:  end for
11:  Classifier .
Algorithm 1 SHAPEnforce

Algorithm 1 presents the pseudo-code for “SHAPEnforce”. The learning approach can be qualitatively interpreted as a two-player game. Expanding on this view, at each stage the predictive learner makes a move by constructing a weak learner and attempts to reweight the training data as-if the surrogate had not acted up to that point. Similarly, once the learner is constructed the surrogate acts to reweight the dataset in its own best interest. The regularization weight then controls the resulting outcome between these two competing actions.

In this work, we consider a linear surrogate model trained on data where whose form and associated is shown in Table 1. We again approximate the SHAP values using Eq. 8. The considered is local in nature and, conditioned on , will downweight any examples with positive SHAP values while upweighting those with negative values. This forces the predictor model to focus on instances where the attributions have a negative impact on the favourable outcome and where the weak learner has made mistakes when the target is favourable, i.e. . By focusing on the examples with negative attribution, their attribution will be increased at the next round, hence the explicability fairness, as determined by an equivalent , will tend to increase. This choice of further reflects the intuition that unprivileged groups are likely to have unfavourable predictions from weak learners and hence negative attribution. Furthermore, with the focus on examples where we expect this modification to reduce the EOD.

Figure 1: Fairness metrics for SHAPSqueeze plotted with varying regularization strength for the (a) synthetic, (b) Adult and (c) Credit Risk test datasets. We set for the synthetic data, for Adult and for the Credit Risk evaluations.

7 Computational Experiments

To evaluate our algorithms we consider three binary classification datasets: a synthetic dataset, the UCI Adult dataset [uci], and a commercial Credit Risk dataset. The train/test splits are shown in Table 2

. The datasets were preprocessed so categorical variables were one-hot encoded and numeric variables were converted to their standard score.

We exemplify the SHAPSqueeze objectives using XGBoost 

[XGBoost2016]. In each experiment, we evaluate the algorithms predictive performance, as measured by accuracy/precision and ROC AUC, as well as measuring the SPD and EOD. To determine these quantities, we use a fixed threshold policy. For SHAPSqueeze, in the case of the synthetic and UCI Adult dataset, this threshold is while a more risk-averse threshold of is set for the commercial Credit Risk dataset. This higher threshold better reflects real-world business practices in this domain. SHAPEnforce, being a modification to AdaBoost, is less calibrated than the SHAPSqueeze implementation and so a threshold of is used in all cases. Additionally, we build linear regression auditor models on the test set to measure the explicability fairness. The equations defining are the same as the employed, and so the explicability fairness is given by the coefficient of the fitted , see Table 1. Note for SHAPEnforce, is built on the data subset where .

dataset train size test size
Synthetic 75000 25000
Adult 32561 16281
Credit Risk 48112 23697
Table 2: Datasets used for the algorithm evaluation.

7.1 Datasets

7.1.1 Synthetic Data

The synthetic dataset was generated to exhibit a very large SPD. To construct this, the distribution of is conditional on and is determined by and . Specifically,

was sampled from a Bernoulli distribution and

contains three sets of covariates: “safe” covariates , “proxy” covariates () and “indirect effect” covariates (). The latter two are sampled from

. From this, we set the log-odds of the binary target (

) are then given by , is a vector of ones. The target is then sampled from . Using this approach we sampled a dataset with safe, indirect effect and proxy variables. Furthermore, the sampled dataset was such that approximately of the favourable outcomes were obtained by the privileged group.

7.1.2 Adult Census

The goal is to predict whether a person will have an income below or above . In this dataset, we consider the variable sex as our protected attribute and removed race, marital status, native country and relationship from our models. The other covariates measure financial information, occupation and education.

7.1.3 Private Credit Risk Dataset

In this dataset, we are trying to infer a customer’s default probability given curated information on their current account transactions. We are interested in removing bias related to age. We binarize the age variable dividing our examples in two groups, an “older” (unprivileged) group of people over 50 and a “younger” group of people under 50 years old.

7.2 Results

Figure 2: Fairness metrics for SHAPEnforce, using the penalty in Table 1, plotted with varying regularization strength. Results for the synthetic, Adult and Credit Risk test datasets are shown in (a), (b) and (c) respectively.

Results for SHAPSqueeze on the test datasets are shown in Fig. 1. We observe that across all 3 datasets increasing induces fairness as observed by reductions in SPD, EOD and . We set for the synthetic dataset, for Adult and for the Credit Risk dataset we set . These values were chosen to ensure the mean gradients from and in the intermediate stages of training, i.e. iterations, when were on the same order of magnitude and effective . For the synthetic data, we observe drops from at to at . It is accompanied by a tolerable drop in the AUC and accuracy of in both cases. Similarly, the SPD is reduced by roughly while the EOD is almost eliminated, taking a value of at . Increasing further, approaches zero and is faithful to our strong “Fairness by Explicability” definition.

We observe the same patterns for the fairness metrics when SHAPSqueeze is applied to the Adult and Credit Risk datasets. In the former case, we observe a reduction of roughly in accuracy and AUC with increasing , at these take values of and respectively. In the latter case, the precision is reduced by and the AUC drops by as we change from to . Contrastingly, for Adult, we observe an increase in precision (from to ) as the fairness regularization increases the scores beyond the classification threshold. A similar effect is seen in the Credit Risk dataset where we observed an increase in the accuracy from to as was increased to . This increased accuracy is attributed to the conservative threshold of employed. This threshold also results in the SPD and EOD being eliminated at as the regularization pushes all of the scores above . At this point is roughly demonstrating that even when the SPD and EOD are zero a model may not be explicably fair. This highlights the differences in fairness definition and, in particular, the use of and not when measuring explicable fairness. To avoid this scenario one would either reduce or select a different value. At , the model has SPD, EOD and values of , and respectively. It is also performant with tolerable drops in the AUC () and precision () observed.

The results for SHAPEnforce are presented in Fig. 2. In all cases, we observe the EOD, SPD, AUC and accuracy decrease with increasing . For the synthetic data, the accuracy drops by approximately from to as we increase . This is accompanied by a drop of in the AUC from to as we change from to . Compared to the statistical fairness metrics, we observe smaller improvements in the explicable fairness. Furthermore, the decreasing trend of

is less pronounced and consistent compared to SHAPSqueeze. This was expected for two reasons. Firstly, the unregularized AdaBoost model is explicably fairer than XGBoost and so there is less explicable unfairness to remove. Secondly, we expected the heuristic nature of the modification provides no guarantees on explicable fairness and so the magnitude of the reduction is not guaranteed. For the synthetic dataset we observe a decrease in

of as we increase from to . Moving to the Adult results, we observe decreases by approximately on changing from to . The SPD and EOD are reduced to and respectively with tolerable drops in accuracy () and AUC () observed. For the Credit Risk data, we again observe explicable fairness improvements, on the order of as we increase . This is accompanied with the SPD and EOD being eliminated for . Similar to SHAPSqueeze, this elimination is due to the regularization pushing all scores below the threshold for . In practice one would use a model from another , such as , where the SPD and EOD are reduced by roughly and respectively while the precision and AUC take values of and respectively. This represents a drop of for the former and for the latter. However, at this point, is only reduced by compared to .

8 Conclusions

In this work, we developed a novel fairness definition, “Fairness by Explicability”, that gives the explanations of an auditor’s surrogate model primacy when determining model fairness. We demonstrated how to incorporate this definition into model training using adversarial learning with surrogate models and SHAP values. This approach was implemented through appropriate regularization terms and a bespoke adaptation of AdaBoost. We exemplified these approaches on 3 datasets, using XGBoost in combination with our regularizations, and connected our choices of surrogate model to “statistical parity” and “equality of opportunity” difference. In all cases, the models trained were explicably and statistically fairer, yet still performant. This methodology can be readily extended to other interpretability frameworks, such as LIME [LIME2016], with the only constraint being that must be appropriately differentiable. Future work will explore more complex surrogate models and different explicability scores in the proposed framework.

9 Related Work

In recent years, there has been significant work done in both model interpretability, adversarial learning and fairness constrained machine learning model training.

Interpretability: Ref. [SHAP2017] provided a unified framework for interpreting model predictions. This framework unified several existing frameworks, e.g. LIME [LIME2016] and DeepLift [DeepLIFT2017], and it can be argued to be the “gold standard” for model interpretability. It provides both local and global measures of feature attribution and through the KernelSHAP algorithm, is model agnostic. Further work has introduced computationally efficient approximations to the SHAP values of [SHAP2017] for tree-based models [TreeSHAP2019]. Other works in interpretability have focussed on causality for model interpretability. These approaches provide insight into why the decision was made, rather than an explanation of the model predictive accuracy and are frequently qualitative in nature. Ref. [CFE2020] is a recent exception, where the counterfactual examples generated obey realistic constraints to ensure practical use and are examined quantitatively through bespoke metrics.

Adversarial Training: Ref. [Beutel2017] used adversarial training to remove EOD while a framework for learning adversarially fair representations was developed in Ref. [madras18a]. Similar, in Ref. [IBM2018] an adversarial network [GANs] was used to debias a predictor network, their specific approach compared favourably to the approach of [Beutel2017].

Training Fair Models: typically, fair learning methodologies have tended to focus on incorporating statistical fairness constraints directly into the model objective function. Ref. [FNN2018]

combined neural networks with statistical fairness regularizations but their form restricts their applicability to neural networks. Similarly, Ref. 

[Goel2018]

trained a fair logistic regression using convex regularizations. These regularizations rely on empirical weights that represent the historical bias and were designed with proportionally fair classification rather than classical fairness measures in mind. Other works have viewed fair model training as one of constrained optimization 

[Zafar2017a, Zafar2017b, Zafar2017c] or have created meta-algorithms for fair classification [MetaLearn2019].

In these works, the approaches to fair learning have tended to focus on fairness metrics associated with more traditional worldviews and less focus on model explicability. Similarly, the role of model explicability in fairness, to the authors’ knowledge, has not been used directly in fair model training but instead research has focussed on the consistency and transparency of explanations. Our work is novel as it places the role of model explicability at the core of a new fairness definition and develops an adversarial learning methodology that is applicable to adaptive boosting and any model trained via gradient-based optimization. In the former case, our proposed algorithm is fully non-parametric where the adversary can come from any model family provided the corresponding explicability scores, in this case SHAP values, can be computed.

10 Acknowledgements

We thank C. Dhanjal, F. Bellosi, G. Jones and L. Stoddart for their useful suggestions and discussions. We also thank Experian Ltd and J. Campos Zabala for supporting this work.

References