Algorithmic Bias and Regularisation in Machine Learning

by   Pádraig Cunningham, et al.

Often, what is termed algorithmic bias in machine learning will be due to historic bias in the training data. But sometimes the bias may be introduced (or at least exacerbated) by the algorithm itself. The ways in which algorithms can actually accentuate bias has not received a lot of attention with researchers focusing directly on methods to eliminate bias - no matter the source. In this paper we report on initial research to understand the factors that contribute to bias in classification algorithms. We believe this is important because underestimation bias is inextricably tied to regularization, i.e. measures to address overfitting can accentuate bias.



page 1

page 2

page 3

page 4


Algorithmic Factors Influencing Bias in Machine Learning

It is fair to say that many of the prominent examples of bias in Machine...

Technical Note: Bias and the Quantification of Stability

Research on bias in machine learning algorithms has generally been conce...

Calibration for Stratified Classification Models

In classification problems, sampling bias between training data and test...

Using Pareto Simulated Annealing to Address Algorithmic Bias in Machine Learning

Algorithmic Bias can be due to bias in the training data or issues with ...

Assessing and Addressing Algorithmic Bias - But Before We Get There

Algorithmic and data bias are gaining attention as a pressing issue in p...

Is there Anisotropy in Structural Bias?

Structural Bias (SB) is an important type of algorithmic deficiency with...

How to Shift Bias: Lessons from the Baldwin Effect

An inductive learning algorithm takes a set of data as input and generat...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Research on bias in Machine Learning (ML) has focused on two issues; how to measure bias and how to ensure fairness [10]

. In this paper we examine the contribution of the classifier algorithm to bias. It is clear that there are two main sources of bias in classification


  • Negative Legacy: the bias is there in the training data, either due to poor sampling, incorrect labeling or discriminatory practices in the past.

  • Underestimation: the classifier underfits the data, thereby focusing on strong signals in the data and missing more subtle phenomena. Thus the classifier accentuates bias that might be present in the data and underestimates the infrequent outcome for the minority group.

In most cases the data (negative legacy) rather than the algorithm itself is the root of the problem. This question is neatly sidestepped in most fairness research by focusing on fair outcomes no matter what is the source of the problem.

We argue that it is useful to explore the extent to which algorithms accentuate bias because this issue is inextricably tied to regularisation, a central issue in ML. In developing ML models a key concern is to avoid overfitting. Overfitting occurs when the model fits to noise in the training data thus reducing generalisation performance. Regularisation controls the complexity of the model in order to reduce the propensity to overfit. The way regularisation is achieved depends on the model. The complexity of decision trees can be controlled by limiting the number of nodes in the tree. Overfitting in neural networks can be managed by limiting the magnitude of the weights.

2 Background

The connection between regularisation and bias arises from the context in which bias occurs. Consider a scenario where the desirable classification outcome is the minority class (e.g. job offer) and the sensitive feature represents groups in the population where minority groups may have low base rates for the desirable outcome [1]. So samples representing good outcomes for minority groups are scarce. Excessive regularisation causes the model to ignore or under-represent these data.

We are not aware of any research on bias in ML that explores the relationship between underestimation and regularisation. This issue has received some attention ([1, 8]) but it is not specifically explored. Instead research has addressed algorithmic interventions that ensure fairness as an outcome[12, 2, 10, 13].

Fundamental to all research on bias in ML are the measures to quantify bias. We define to be the outcome (class variable) and is the ‘desirable’ outcome. indicates that the classifier has predicted the desirable outcome. is a ‘sensitive’ feature and is the minority group (e.g. non-white). In this notation the Calders Verwer discrimination score[2] is:


If there is no discrimination, this score should be zero. In contrast the disparate impact () definition of unfairness [7] is a ratio rather than a difference:


is the 80% rule, i.e. outcomes for the minority should be within 80% of those for the majority.

For our evaluations we define an underestimation score () in line with that quantifies the underestimation effect described above:


If the classifier is predicting fewer desirable outcomes than are present in the training data.

2.1 Illusory Correlation

Algorithmic bias due to underestimation is similar in some respects to the concept of Illusory Correlation in Psychology [3]. The general pattern for Illusory Correlation is shown on the left in Figure 1. People associate the frequent class with the majority and the rare class with the minority, the infrequent class is overestimated for the minority group [5]

. For instance, if the infrequent class is antisocial behaviour, the incidence will be over-associated with the minority. If the frequent class is a good credit rating, it will be over-associated with the majority. This over-estimation also happens for the frequent class in ML. However, the impact for the infrequent class and the minority feature is the opposite; the algorithm will likely accentuate the under-representation.

Figure 1: The relationship between Illusory Correlation and ML Bias. If we think of Illusory Correlations as classification errors, both false positives (FPs) and false negatives (FNs) are increased. Whereas ML bias moves classification errors in one direction; a tendency to increase FPs will reduce FNs.

In Figure 1 we see the similarities and differences between Illusory Correlation and ML bias. For the majority feature and the frequent class the behaviour is the same, the association is accentuated. However, the impact on the infrequent side of the classification is different, FNs will be reduced.

3 Evaluation

We present preliminary results on three binary classification datasets from the UCI We also include an analysis on an anonomised and reduced version of the ProPublica Recidivism dataset [9, 6]. The UCI datasets are Adult, Wholesale Customers and User Knowledge Modelling (see Table 1).

We use the Adult and Recidivism datasets to demonstrate the relationship between underestimation and regularisation (section 3.2) and then we explore this in more detail in the other two datasets. In this second part of the evaluation, in order to control the propensity for bias, we introduce a sensitive binary feature S with S = 0 representing the sensitive minority. To show bias due to underestimation we set and , i.e. the sensitive group is under-represented in the desirable class. We also report baseline results with the incidence of S = 0 balanced with the sensitive group occurring 30% of the time in both classes.

Dataset Samples Features % Positive Train : Test
Adult 48,842 14 25% 2:1
Recidivism 7,214 7 45% 2:1
Knowledge 403 5 26% 1:2
Wholesale 440 7 32% 1:1
Table 1: The datasets used in the evaluation.

We are interested in uncovering situations where this under-representation is accentuated by the classifier. In section 3.3 we look at the untuned performance of six classifiers from scikit-learn. In section 3.4 we look at the impact of regularisation on neural net performance. In the next section we provide a more formal account of our bias measures.

3.1 Evaluation Measures

If a binary classifier is biased this will show up as an mismatch between the predicted positives and the actual positives (see Table 2). This is likely to happen when the training data is significantly imbalanced resulting in predictions that underestimate the overall minority class[4].

Pos Neg
Actual Pos TP FN P
Table 2: Confusion matrix for binary classification.

From this perspective, bias is independent of accuracy so whether predictions are correct or not (True or False) is not relevant. So the minority class bias is effectively an underestimation at a class level, i.e.:


When the classifier is biased away from the positive class, this US score is less than 1. The US score defined above measures the same fraction, but for samples where the sensitive feature .


While bias can be considered independently of accuracy, there is an important interplay between bias and accuracy. So the final evaluation measure we consider is the overall accuracy:


3.2 Underestimation in Action

We use the Adult and Recidivism datasets to show the impact of underestimation. In Table 3 we see that in the Adult dataset Females with salaries greater than 50K account for just 4% of cases and in the Recidivism dataset Caucasians are relatively underrepresented among repeat offenders. We will see that this under-representation is accentuated by the classifiers.

Table 3: Summary statistics for the Adult and Recidivism datasets. In both cases a feature/class combination is significantly underrepresented in the data. Key feature-specific percentages are shown in red.

3.2.1 Adult Dataset:

This dataset is much studied in research on bias in ML because there is clear evidence of Negative Legacy [2]. At 33%, females are underrepresented in the dataset. This under-representation is worse in the K salary category where only are female.

Figure 2: A demonstration of model bias and underestimation on the Adult dataset. Model F fits the data well with accuracy 86%. Model U underfits but still has accuracy 85%. Underfitting exacerbates the underestimation for Females.

To illustrate underestimation we use a gradient boosting classifier

[11]. We build two classifiers, one with just 5 trees (Model U) and one with 50 (Model F). Both models have good accuracy, 85% and 86% respectively. Figure 2(a) shows the actual incidence of Salary

50 overall and for females. It also shows the predicted incidence by the two models. We can see that both models underestimate the probability of the Salary

50 class overall. On the right in Figure 2(a) we can see that this underestimation is exacerbated for females. This underestimation is worse in the underfitted model. The actual occurrence of salaries 50K for females is 11% in the data, the underfitted model is predicting 6%. The extent of this underestimation is quantified in Figure 2)(b).

So underestimation is influenced by three things, underfitting, minority class and minority features. The underfitted model does a poor job of modelling the minority feature (female) in the minority class (K). This is not that easy to fix because it is often desirable not to allow ML models overfit the training data.

3.2.2 Recidivism Dataset:

We use the decision tree classifier from to demonstrate underestimation on the Recidivism dataset. In this case we control overfitting by constraining the size of the tree. The underfitted model (U) has 30 leaf nodes and the other model has 1349 leaves.

Figure 3: A demonstration of model bias and underestimation on the Recidivism dataset. In this case the underfitted model (Model (F)) has better accuracy but underestimates recidivism for Caucasians.

The picture here is similar but in this case the underfitted model has better accuracy. This accuracy comes at the price of increased underestimation. The underestimation is particularly bad for the minority feature with the level of recidivism for Caucasians significantly underestimated. As reported in other analysis [9, 6] the input features do not provide a strong signal for predicting recidivism. So the fitted model does not generalise well to unseen data. On the other hand the model that is forced to underfit generalises better but fails to capture the Caucasian recidivism pattern.

3.3 Baseline Performance of Classifiers

We move on now to look at the impact of a synthetic minority feature injected into the other two datasets. This synthetic feature is set up to be biased (negative legacy) and . We test to see if this bias is accentuated (i.e. ) for seven popular classifiers available in scikit-learn

. For this assessment, the default parameters for the classifiers are used. There are two exceptions to this; the number of iterations for Logistic Regression and the Neural Network were increased to get rid of convergence warnings.

(a) Wholesale dataset.
(b) Knowledge dataset
Figure 4: The varying impact of underestimation across multiple classifiers. A sensitive feature S = 0 has been added to both datasets (15% in the desirable class and 30% in the majority class.

For the results shown in Figure 4 the main findings are:

  • The tree-based classifiers (Decision Tree, Gradient Boost & Random Forest) all perform very well showing no bias (or a positive bias), both overall (US) and when we filter for the sensitive feature(


  • The other classifiers (

    -Nearest Neighbour, Naive Bayes, Logistic Regression & Neural Network) all show bias (underestimation), overall and even more so in the context of the sensitive feature.

  • The accentuation effect is evident for the four classifiers that show bias, i.e. they predict even less than 15% of the desirable class for the sensitive feature.

This base-line performance by the tree based classifiers is very impressive. However, it is important to emphasise that the performance of the other methods can be improved significantly by parameter tuning – as would be normal in configuring an ML system. Finally, it should not be inferred that tree-based methods are likely to be free of underestimation problems. In particular, the Decision Tree implementation in scikit-learn provides a number of mechanisms for regularisation that may introduce underestimation as a side effect.

(a) 30% in pos. class, 30% in neg.
(b) 15% in pos. class, 30% in neg.
(c) 30% in pos. class, 30% in neg.
(d) 15% in pos. class, 30% in neg.
Figure 5: The impact of underfitting on bias. Higher values of Alpha result in underfitting. Furthermore, when the sensitive attribute is underrepresented in the desirable class (b) & (d) the bias is exacerbated.

3.4 Impact of Underfitting

The standard practice in training a classifier is to ensure against overfitting in order to get good generalisation performance. Kamishima et al. [8] argue that bias due to underestimation arises when a classifier underfits the phenomenon being learned. This will happen when the data available is limited and samples covering the sensitive feature and the desirable class are scarce.

The neural network implementation provides an parameter to control overfitting. It works by providing control over the size of the weights in the model. Constraining the weights reduces the potential for overfitting. The plots in Figure 5 show how underestimation varies with this – high values cause underfitting. These plots show Accuracy and Underestimation for different values of . For the plots on the left (Figure 5 (a)&(c)) the incidence of the sensitive feature is the same in both the positive and negative class (30%). For the plots on the right ((b)&(d)) the sensitive feature is underrepresented (15%) in the positive class.

In Figure 5 (a) and (c) we see that high values of (i.e. underfitting) result in significant bias. When the base rates for the minority group in the positive and negative classes are the same the US and rates are more or less the same.

When the prevalence of the sensitive group in the desirable class is lower ((b)&(d)) the bias is exacerbated. It is important to emphasise that a good score simply means that underestimation is not present. There may still be bias against the minority group (i.e. poor CV or scores).

4 Conclusions & Future Work

In contrast to what illusory correlation tells us about how humans precive things, underestimation occurs in ML classifiers because they are inclined to over-predict common phenomena and under-predict things that are rare in the training data. We have shown, on two illustrative datasets, Adult and Recidivism, how the impact of under-representation in data leads to underestimation of the classifiers built on that data. We believe classifier bias due to underestimation is worthy of research because of its close interaction with regularisation. We have demonstrated this interaction on two datasets where we vary the levels of under-representation and regularisation showing the impact on underestimation. Underfitting data with an under-represented feature in the desirable class leads to increased underestimation or bias of the classifiers.

We are now exploring how sensitive underestimation is to distribution variations in the class label and the sensitive feature. Our next step is to develop strategies to mitigate underestimation.


  • [1] S. Barocas, M. Hardt, and A. Narayanan (2019) Fairness and machine learning. Note: Cited by: §2, §2.
  • [2] T. Calders and S. Verwer (2010) Three naive bayes approaches for discrimination-free classification. Data Mining and Knowledge Discovery 21 (2), pp. 277–292. Cited by: §2, §2, §3.2.1.
  • [3] L. J. Chapman and J. P. Chapman (1969) Illusory correlation as an obstacle to the use of valid psychodiagnostic signs.. Journal of abnormal psychology 74 (3), pp. 271. Cited by: §2.1.
  • [4] N. V. Chawla, A. Lazarevic, L. O. Hall, and K. W. Bowyer (2003) SMOTEBoost: improving prediction of the minority class in boosting. In European conference on principles of data mining and knowledge discovery, pp. 107–119. Cited by: §3.1.
  • [5] F. Costello and P. Watts (2019) The rationality of illusory correlation.. Psychological review 126 (3), pp. 437. Cited by: §2.1.
  • [6] J. Dressel and H. Farid (2018) The accuracy, fairness, and limits of predicting recidivism. Science advances 4 (1), pp. eaao5580. Cited by: §3.2.2, §3.
  • [7] M. Feldman, S. A. Friedler, J. Moeller, C. Scheidegger, and S. Venkatasubramanian (2015) Certifying and removing disparate impact. In proceedings of the 21th ACM SIGKDD international conference on knowledge discovery and data mining, pp. 259–268. Cited by: §2.
  • [8] T. Kamishima, S. Akaho, H. Asoh, and J. Sakuma (2012) Fairness-aware classifier with prejudice remover regularizer. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases, pp. 35–50. Cited by: §1, §2, §3.4.
  • [9] J. Larson, S. Mattu, L. Kirchner, and J. Angwin (2016) How we analyzed the compas recidivism algorithm. ProPublica (5 2016) 9. Cited by: §3.2.2, §3.
  • [10] A. K. Menon and R. C. Williamson (2018) The cost of fairness in binary classification. In Conference on Fairness, Accountability and Transparency, pp. 107–118. Cited by: §1, §2.
  • [11] L. Prokhorenkova, G. Gusev, A. Vorobev, A. V. Dorogush, and A. Gulin (2018) CatBoost: unbiased boosting with categorical features. In Advances in neural information processing systems, pp. 6638–6648. Cited by: §3.2.1.
  • [12] R. Zemel, Y. Wu, K. Swersky, T. Pitassi, and C. Dwork (2013) Learning fair representations. In International Conference on Machine Learning, pp. 325–333. Cited by: §2.
  • [13] Y. Zhang and L. Zhou (2019)

    Fairness assessment for artificial intelligence in financial industry

    arXiv preprint arXiv:1912.07211. Cited by: §2.