A Survey on Bias and Fairness in Machine Learning

by   Ninareh Mehrabi, et al.

With the widespread use of AI systems and applications in our everyday lives, it is important to take fairness issues into consideration while designing and engineering these types of systems. Such systems can be used in many sensitive environments to make important and life-changing decisions; thus, it is crucial to ensure that the decisions do not reflect discriminatory behavior toward certain groups or populations. We have recently seen work in machine learning, natural language processing, and deep learning that addresses such challenges in different subdomains. With the commercialization of these systems, researchers are becoming aware of the biases that these applications can contain and have attempted to address them. In this survey we investigated different real-world applications that have shown biases in various ways, and we listed different sources of biases that can affect AI applications. We then created a taxonomy for fairness definitions that machine learning researchers have defined in order to avoid the existing bias in AI systems. In addition to that, we examined different domains and subdomains in AI showing what researchers have observed with regard to unfair outcomes in the state-of-the-art methods and how they have tried to address them. There are still many future directions and solutions that can be taken to mitigate the problem of bias in AI systems. We are hoping that this survey will motivate researchers to tackle these issues in the near future by observing existing work in their respective fields.



There are no comments yet.


page 8


A Survey on Techniques for Identifying and Resolving Representation Bias in Data

The grand goal of data-driven decision-making is to help humans make dec...

Managing Bias in Human-Annotated Data: Moving Beyond Bias Removal

Due to the widespread use of data-powered systems in our everyday lives,...

Attesting Biases and Discrimination using Language Semantics

AI agents are increasingly deployed and used to make automated decisions...

Bias and Debias in Recommender System: A Survey and Future Directions

While recent years have witnessed a rapid growth of research papers on r...

Mining the online infosphere: A survey

The evolution of AI-based system and applications had pervaded everyday ...

Addressing Fairness, Bias and Class Imbalance in Machine Learning: the FBI-loss

Resilience to class imbalance and confounding biases, together with the ...

Bias in Machine Learning What is it Good (and Bad) for?

In public media as well as in scientific publications, the term bias is ...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1. Introduction

Machine algorithms have penetrated every aspect of daily life. Algorithms make movie recommendations, suggest products to buy, and are increasingly used in high-stakes decisions in loan applications (mukerjee2002multi), dating and hiring (cohen2019efficient; bogen2018help). There are clear benefits to algorithmic decision-making; unlike people, machines do not become tired or bored (danziger2011extraneous; o2010routledge), and can take into account orders of magnitude more factors than people can. However, like people, algorithms are vulnerable to biases that render their decisions “unfair” (O'Neil:2016:WMD:3002861; angwin2016machine). In the context of decision-making, fairness is the absence of any prejudice or favoritism toward an individual or a group based on their inherent or acquired characteristics

. Thus, an unfair algorithm is one whose decisions are skewed toward a particular group of people. A canonical example comes from a tool used by courts in the United States to make parole decisions. The software, Correctional Offender Management Profiling for Alternative Sanctions (COMPAS), measures the risk that a person will commit another crime in the future. Judges use COMPAS to decide whether to release an offender, or to keep him or her in prison. An investigation into the software found a bias against African-Americans:

111https://www.propublica.org/article/machine-bias-risk-assessments-in-criminal-sentencing COMPAS is more likely to assign a higher risk score to African-American offenders than to Caucasians with the same profile. Similar findings have been made in other areas, such as an AI system that judges beauty pageant winners but was biased against darker-skinned contestants,222https://www.theguardian.com/technology/2016/sep/08/artificial-intelligence-beauty-contest-doesnt-like-black-people

or facial recognition software in digital cameras that overpredicts Asians as blinking.

333http://content.time.com/time/business/article/0,8599,1954643,00.html These biased predictions stem from the hidden or neglected biases in data or algorithms.

In this survey we identify two potential sources of unfairness in machine learning outcomes—those arising from biases in the data and those arising from the algorithms. We review research investigating how biases in data skew what is learned by machine learning algorithms, and nuances in the way the algorithms themselves work that prevent them from making fair decisions—even when the data is unbiased.

We begin the review with several highly visible real-world cases of where unfair machine learning algorithms have led to suboptimal and discriminatory outcomes. We then describe the many types of biases that occur in data and present the different ways that the concept of fairness has been operationalized and studied in literature. We discuss the ways in which these two concepts are coupled. Last, we will focus on different families of machine learning approaches, how fairness manifests differently in each one, and the current state-of-the-art for tackling them, followed by potential areas of future work in each of the domains.

2. Real-World Examples of Algorithmic Unfairness

With the popularity of AI and machine learning over the past decades, and their epidemic spread in different applications, safety and fairness constraints have become a huge issue for researchers and engineers. Machine learning is used in courts to assess the likelihood of a defendant becoming a recidivist. It is used in different medical fields, in childhood welfare systems (pmlr-v81-chouldechova18a), and autonomous vehicles. All of these applications have a direct effect in our lives and can harm our society if not designed and engineered correctly, with considerations to fairness. (osoba2017intelligence) has a list of the applications and the ways these AI systems affect our daily lives with their inherent biases, such as the existence of bias in AI chatbots, employment matching, flight routing, and automated legal aid for immigration algorithms, and search and advertising placement algorithms. (howard2018ugly) also discusses specific examples of how bias has infused itself into current AI and robotic systems, such as bias in face recognition applications, voice recognition, and search engines. Therefore, it is important for researchers and engineers to be concerned about the downstream applications and their potential harmful effects when modeling an algorithm or a system. A well-known example is COMPAS, which is a widely used commercial risk assessment software that was compared to normal human judgment in a study and was later discovered to be not any better than a normal human (Dresseleaao5580). It is also interesting to note that although COMPAS uses 137 features, only 7 of those were presented to the people in the study. In (Dresseleaao5580)

, authors also argued that COMPAS is not any better than a simple logistic regression model when making decisions. We should think responsibly, and recognize that these tools are used in courts and are actually making decisions which affect peoples’ lives; therefore, considering fairness constraints is a crucial task while designing and engineering these types of sensitive tools. In another similar study, while investigating sources of group unfairness (unfairness across different groups is defined later), the authors in

(tolan2019machine) compared SAVRY, a tool used in risk assessment frameworks that includes human intervention in its process, with automatic machine learning methods in order to see which one is more accurate and fair. Conducting these types of studies should be done more frequently, but prior to releasing the tools in order to avoid doing harm.

Another interesting direction that researchers have taken is introducing tools that can assess the amount of fairness in a tool or system. For example, Aequitas (saleiro2018aequitas) is a toolkit enabling users to test models for several bias and fairness metrics in relation to multiple population subgroups. With the data from its reports, Aequitas helps data scientists, machine learning researchers, and policymakers make well-informed decisions and thereby avoid committing harm and causing any disadvantage toward certain populations. AI Fairness 360 (AIF360) is another toolkit developed at IBM to facilitate the transition of fairness research algorithms in an industrial setting and to provide a common framework for fairness researchers to share and evaluate algorithms (bellamy2018ai). These types of toolkits can be helpful for learners, researchers, and people working in the industry to move towards developing fair machine learning application away from discriminatory behavior. In addition to COMPAS, discriminatory behavior was also evident in an algorithm that would deliver ads promoting job opportunities in the Science, Technology, Engineering and Math (STEM) fields (lambrecht2018algorithmic). This ad was explicitly intended to be gender-neutral in its delivery. Empirically, however, fewer women saw the ad than men due to gender-imbalance which would result in younger women being considered as a prized demographic and more expensive to show ads to. This optimization algorithm would deliver ads in a discriminatory way although its original and pure intention was to be gender-neutral. Bias in facial recognition systems (raji2019actionable) and recommender systems (schnabel2016recommendations) have also been largely studied and evaluated and in many cases shown to be discriminative towards certain populations and subgroups. In order to be able to address the bias issue in these application, it is important for us to know where these biases are coming from and what we can do to prevent them.

3. Bias in Data

Figure 1. Illustration of biases in data. Red line shows the regression (MLR) for the entire population, while dashed green lines are regressions for each subgroup, and the solid green line is the unbiased regression. (a) When all subgroups are of equal size, then MLR shows a positive relationship between the outcome and the independent variable. (b) Regression shows almost no relationship in less balanced data. The relationships between variables within each subgroup, however, remain the same. (Credit: Nazanin Alipourfard)

Data, especially big data, is often heterogeneous, generated by subgroups with their own characteristics and behaviors. The heterogeneities, some of which are described below, can bias the data. A model learned on biased data may lead to unfair and inaccurate predictions. To illustrate how biases in data affect machine learning, consider a hypothetical nutrition study which measured how the outcome, body mass index (BMI), changes as a function of daily pasta calorie intake (Figure 1

). Regression analysis (solid red line) shows a positive relationship in the population between these variables. The positive trend suggests that increased pasta consumption is associated with higher BMI. However, unbeknown to researchers, the study population was heterogeneous, composed of subgroups that vary in their fitness level—people who did not exercise, people with normal activity levels, and athletes. When data is disaggregated by fitness level, the trends within each subgroup are negative (dashed green lines), leading to the conclusion that increased pasta consumption is, in fact, associated with a lower BMI. Recommendations for pasta consumption that come from the naive analysis are opposite to those coming from a more careful analysis that accounts for the differences between subgroups.

3.1. Types of Bias

Bias in data can exist in many shapes and forms, some of which can lead to unfairness in different downstream learning tasks. In (suresh2019framework), authors talk about sources of bias in machine learning with their categorizations and descriptions in order to motivate future solutions to each of the sources of bias introduced in the paper. In (olteanu2016social), authors prepare a complete list of different types of biases with their corresponding definitions that exist in different cycles from data origins to its collection and its processing. Here we will reiterate some of the most general and important sources of bias introduced in these two papers and also add in some work from other existing research papers. Additionally, we will also introduce a different categorization and grouping of these definitions later in the paper.

  1. Historical Bias.

    Historical bias is a normative concern with the world as it is; it is a fundamental, structural issue with the first step of the data generation process and can exist even given perfect sampling and feature selection

    (suresh2019framework). An example of this type of bias can be found in a 2018 image search result where searching for women CEOs ultimately resulted in fewer female CEO images due to the fact that only 5% of Fortune 500 CEOs were woman—which would cause the search results to be biased towards male CEOs (suresh2019framework). These search results were of course reflecting the reality, but whether or not the search algorithms should reflect this reality is an issue worth considering.

  2. Representation Bias. Representation bias arises when defining and sampling from a population (suresh2019framework)

    . Lacking geographical diversity in datasets like ImageNet (as shown in Figures

    3 and 4) is an example for this type of bias (suresh2019framework). This demonstrates a bias towards Western countries.

  3. Measurement Bias. Measurement bias arises when choosing and measuring the particular features of interest (suresh2019framework). An example of this type of bias was observed in the recidivism risk prediction tool COMPAS, where prior arrests and friend/family arrests were used as proxy variables to measure “crime” or some underlying notion of “riskiness”—-which on its own can be viewed as mismeasured proxies. This is because minority communities are often more highly policed and have higher arrest rates, so there is a different mapping from crime to arrest for people from these communities (suresh2019framework).

  4. Evaluation Bias. Evaluation bias occurs during model iteration and evaluation (suresh2019framework). This includes the use of inappropriate and disproportionate benchmarks for evaluation of applications such as Adience and IJB-A benchmarks. These benchmarks are used in the evaluation of facial recognition systems that were biased toward skin color and gender (pmlr-v81-buolamwini18a), and can serve as examples for this type of bias (suresh2019framework).

  5. Aggregation Bias. Aggregation bias arises when flawed assumptions about the population affect model definition (suresh2019framework). An example of this type of bias can be seen in clinical aid tools. Consider diabetes patients who have known differences in associated complications across ethnicities, or more specifically, HbA1c levels that are widely used to diagnose and monitor diabetes differ in complex ways across ethnicities and genders. Therefore, because of these factors and their different meanings and importance within different subpopulations, a single model is unlikely to be best-suited for any group in the population even if they are equally represented in the training data (suresh2019framework). Any general assumptions about different populations can result in aggregation bias.

  6. Population Bias. Population bias arises from differences in demographics or other user characteristics between a user population represented in a dataset or platform and a target population (olteanu2016social). An example of this type of bias can arise from different user demographics on different social platforms, such as women being more likely to use Pinterest, Facebook, Instagram, while men being more active in online forums like Reddit or Twitter. More of such examples and statistics related to social media use among young adults according to gender, race, ethnicity, and parental educational background can be found in (10.1111/j.1083-6101.2007.00396.x).

  7. Simpson’s Paradox. Simpson’s paradox (blyth1972simpson) can bias the analysis of heterogeneous data that is composed of subgroups or individuals with different behaviors. According to the paradox, an association observed in data that has been aggregated over an entire population may be quite different from, and even opposite to, associations found in the underlying subgroups. One of the better-known examples of Simpson’s paradox arose during the gender bias lawsuit against UC Berkeley (bickel1975sex). Analysis of graduate school admissions data appeared to reveal a statistically significant bias against women, a smaller fraction of whom were being admitted to graduate programs. However, when admissions data was disaggregated by department, women applicants had parity and even a small advantage over men. The paradox arose because women tended to apply to departments with lower admission rates for both genders. Simpson’s paradox has been observed in a variety of domains, including biology (Chuang2009), psychology (kievit2013simpson), astronomy (minchev2019yule), and computational social science (Lerman2018).

  8. Longitudinal Data Fallacy. Observational studies often treat cross-sectional data as if it were longitudinal, which may create biases due to Simpson’s paradox. As an example, analysis of bulk Reddit data (Barbosa2016) found that average comment length decreased over time. However, bulk data represented a cross-sectional snapshot of the population, which in reality contained different cohorts who joined Reddit in different years. When data was disaggregated by cohorts, the comment length within each cohort was found to increase over time.

  9. Sampling Bias. Sampling bias arises due to non-random sampling of subgroups.

    As a consequence of sampling bias, the trends estimated for one population may not generalize to data collected from a new population. For the intuition, consider again the example in Figure 

    1. Suppose the next time the study is conducted, one of the subgroups is sampled more frequently than the rest. The positive trend found by the regression model in the first study almost completely disappears (solid red line in plot on the right), although the subgroup trends (dashed green lines) are unaffected.

  10. Behavioral Bias. Behavioral bias arises from differences in user behavior across platforms or contexts, or across users represented in different datasets (olteanu2016social). An example of this type of bias can be observed in (miller2016blissfully), where authors show how differences in emoji representations among platforms can result in different reactions and behavior from people and sometimes even leading to communication errors.

  11. Content Production Bias. Content Production bias arises from lexical, syntactic, semantic, and structural differences in the contents generated by users (olteanu2016social). An example of this type of bias can be seen in (2cac1d4b56eb4e43ad55a690a5595a91) where the differences in use of language across different gender and age groups is discussed. The differences in use of language can also be seen across and within countries and populations.

  12. Linking Bias. Linking bias arises from differences in the attributes of networks obtained from user connections, interactions, or activity (olteanu2016social). In (mehrabi2019debiasing) authors show how social networks can be biased toward low-degree nodes when only considering the links in the network and not considering the content and behavior of users in the network. (wilson2009user) also shows that user interactions deviate significantly from social link patterns in terms of factors such as time in the network, method of interaction, and types of users involved. The differences and biases in the networks can be a result of many factors, such as network sampling, as shown in (gonzalez2014assessing; morstatter2013sample), which can change the network measures and cause different types of problems.

  13. Temporal Bias. Temporal bias arises from differences in populations and behaviors over time (olteanu2016social). An example can be observed in Twitter where people talking about a particular topic start using a hashtag at some point to capture attention, then continue discussing the event without the hashtag (tufekci2014big; olteanu2016social).

  14. Popularity Bias. Items that are more popular tend to be exposed more. However, popularity metrics are subject to manipulation—for example, by fake reviews or social bots (nematzadeh2017algorithmic). An example of this type of bias can be seen in search engines (nematzadeh2017algorithmic; 816269) or recommendation systems where popular objects would be presented more to the public. But this presentation may not be a result of good quality; instead, it may be due to other biased factors.

  15. Algorithmic Bias. Algorithmic bias is added by the algorithm itself and is not present in the input data (Baeza-Yates:2018:BW:3229066.3209581).

  16. User Interaction Bias. User Interaction bias is a significant source of bias, not only on the Web, but from two notable sources—the user interface and the user’s own self-selected, biased interaction (Baeza-Yates:2018:BW:3229066.3209581). This type of bias can be influenced by other types and subtypes, such as Presentation and Ranking biases.

    Presentation Bias. Presentation bias is a result of how information is presented (Baeza-Yates:2018:BW:3229066.3209581). For example, on the Web everything seen by the user can get clicks, while everything else gets no click. And it could be the case that the user does not see all the information on the Web (Baeza-Yates:2018:BW:3229066.3209581). Ranking Bias. The top-ranked result will attract more clicks than the others because it is both the most relevant and also ranked in the first position. This bias affects search engines (Baeza-Yates:2018:BW:3229066.3209581) and crowdsourcing applications (lerman2014leveraging).

  17. Social Bias. Social bias defines how content coming from other people affects our judgment (Baeza-Yates:2018:BW:3229066.3209581). An example of this type of bias can be a case where we want to rate or review an item with a low score, but when influenced by other high ratings, we change our scoring thinking that perhaps we are being too harsh (wang2014amazon; Baeza-Yates:2018:BW:3229066.3209581).

  18. Emergent Bias. Emergent bias arises in a context of use with real users. This bias typically emerges some time after a design is completed, as a result of changing societal knowledge, population, or cultural values (Friedman:1996:BCS:230538.230561). This type of bias is more likely to be observed in user interfaces, since interfaces by design seek to reflect the capacities, character, and habits of prospective users (Friedman:1996:BCS:230538.230561). This type of bias can itself be divided into more subtypes, as discussed in detail in (Friedman:1996:BCS:230538.230561).

  19. Self-Selection Bias. Self-selection bias444https://data36.com/statistical-bias-types-explained/ is a subtype of the selection or sampling bias in which subjects of the research select themselves.

    An example of this type of bias can be observed in situations where survey takers decide that they can appropriately participate in a study themselves. For instance, in a survey study about smart or successful students, some less successful students might think that they are successful to take the survey—which would then bias the outcome of the analysis. In fact, the chances of this situation happening is high, as the more successful students probably would not spend time filling out surveys that would increase the risk of self-selection bias.

  20. Omitted Variable Bias. Omitted variable bias4 occurs when one or more important variables are left out of the model. An example for this case would be when someone designs a model to predict, with relatively high accuracy, the annual percentage rate at which customers will stop subscribing to a service, but soon observes that a big chunk of users are canceling their subscription without any warning from the designed model. Now imagine that the reason for canceling the subscriptions is a strong competitor entering the market and offering the same solution, but for half the price. This is something the model was not ready for, so the presence of the competitor is an omitted variable in this case.

  21. Cause-Effect Bias. Cause-effect bias4 can happen as a result of the fallacy that correlation implies causation. An example of this type of bias can be observed in a situation where a data analyst in a company wants to analyze how successful a new loyalty program is. The analyst sees that customers who signed up for the loyalty program are spending more money in the company’s e-commerce store than those who did not. It is going to be problematic if the analyst immediately jumps to the conclusion that the loyalty program is successful, since it is also possible that only those more committed or loyal customers are interested in the loyalty program in the first place, and they might have planned to spend more anyway. This type of bias can have serious consequences due to its nature and the roles it can play in sensitive decision-making policies.

  22. Observer Bias. Observer bias4 happens when researchers subconsciously project their expectations onto the research. This type of bias can happen when researchers (unintentionally) influence participants (during interviews and surveys) or when they cherry pick participants or statistics that will favor their research.

  23. Funding Bias. Funding bias4 happens when the results of a scientific study are biased in a way that supports the financial sponsor of the research. An example of this type of bias can be observed when employees of a company report biased results in their data and statistics in order to keep the funding agencies or other parties satisfied.

Figure 2. Bias definitions in the data, algorithm, and user interaction feedback loop are placed on their most appropriate arrows.

Existing work tried to categorize these bias definitions into groups, such as definitions falling solely under data or user interaction. However, we believe that due to the existence of the feedback loop phenomenon, which is a situation where the trained machine learning model informs decisions that then affect the data collected for future iterations of the training process (chouldechova2018frontiers), these definitions are intertwined, and we need a categorization which closely models this situation. This feedback loop is not only existent between the data and the algorithm, but also some work analyzed the existence of this loop between the algorithms and user interaction (chaney2018algorithmic). Getting inspiration from these papers, we also modeled categorization of bias definitions, as shown in Figure 2, and grouped these definitions on the arrows of the loop where we thought they were most effective. We emphasize the fact again that these definitions are intertwined, and one should consider how they affect each other and this cycle, and try to address them accordingly.

3.2. Data Bias Examples

There are multiple ways that discriminatory bias can seep into data. For instance, using unbalanced data can create biases against underrepresented groups. In (pmlr-v81-buolamwini18a), the authors show that datasets like IJB-A and Adience are imbalanced and contain mainly light-skinned subjects—79.6% for IJB-A and 86.2% for Adience. This can bias the analysis towards dark skin groups who are underrepresented in the data. In another instance, the way we use and analyze our data can create bias when we do not consider different subgroups in the data. In (pmlr-v81-buolamwini18a), the authors also show that considering only male-female groups is not enough, but there is also a need to use race to further subdivide the gender groups into light-skin females, light-skin males, dark-skin males, and dark-skin females. It’s only in this case that we can clearly observe the bias towards dark-skin females, as previously dark-skin males would compromise for dark-skin females and would hide the underlying bias towards this subgroup. (zou2018ai) also analyzes some examples of the biases that can exist in the data and algorithms and offer some recommendations and suggestions toward mitigating these issues. These data biases can be more dangerous in other sensitive applications. For example, in medical domains there are many instances in which the data studied and used are skewed toward certain populations—which can have dangerous consequences for the underrepresented communities. (doi:10.1056/NEJMsa1507092) showed how exclusion of African-Americans resulted in their misclassification in clinical studies, so they became advocates for sequencing the genomes of diverse populations in the data to prevent harm to underrepresented populations. Authors in (shawfurther) studied the 23andMe genotype dataset and found that out of 2,399 individuals, who have openly shared their genotypes in public repositories, 2,098 (87%) are European, while only 58 (2%) are Asian and 50 (2%) African. Other such studies were conducted in (10.1093/aje/kwx246) which states that UK Biobank, a large and widely used genetic dataset, is not representative of the sampling population, and that there is evidence of a ”healthy volunteer” selection bias. (vickers2014enhancing) has other examples of studies on existing biases in the data used in the medical domain.

Figure 3. Fraction of Open Images and ImageNet images from each country. In both datasets,top represented locations include the US and Great Britain. Countries are represented by their two-letter ISO country codes (shankar2017no).
Figure 4. Distribution of the geographically identifiable images in the Open Images data set, by country. Almost a third of the data in their sample was US-based, and 60% of the data was from the six most represented countries across North America and Europe, from (shankar2017no).

(article) also looks at machine-learning models and data utilized in medical fields, and writes about the disparate impacts of artificial intelligence in health care. This issue not only exists in medical fields, but popular machine-learning datasets that serve as a base for most of the developed algorithms and tools can also be biased—which can be harmful to the downstream applications that are based on these datasets. For instance, ImageNet and Open Images are two widely used datasets in machine-learning. In (shankar2017no), researchers studied and showed that these datasets suffer from representation bias and advocate for the need to incorporate geo-diversity and inclusion while creating such datasets.

4. Algorithmic Fairness

Fighting against bias and discrimination has a long history in philosophy and psychology, and recently in machine-learning. However, in order to be able to fight against discrimination and achieve fairness, one should first define the notion of fairness. Philosophy and psychology have tried to define the concept of fairness long before computer science started exploring it. The fact that there is still no universal definition of fairness speaks for itself. Different preferences and outlooks in different cultures lend a preference to different ways of looking at fairness, which makes it harder to come up with just a single definition that is acceptable to everyone in a situation. Indeed, even in computer science, where most of the work on proposing new fairness constraints for algorithms has come from the West, and a lot of these papers use the same datasets and problems to show how their constraints perform, there is still no clear agreement on which constraints are the most appropriate for those problems.

4.1. Types of Discrimination

Broadly, fairness is the absence of any prejudice or favoritism towards an individual or a group based on their inherent or acquired characteristics in the context of decision-making. Even though fairness is an incredibly desirable quality in society, it can be surprisingly difficult to achieve in practice. In order to understand how we can have so many definitions of fairness, it is also crucial to understand the different kinds of discrimination that may occur.

  1. Direct Discrimination. Direct discrimination occurs when individuals receive less favorable treatment explicitly based on the protected attributes (ijcai2017-549). Typically, there are some traits identified by law on which it is illegal to discriminate against, and it is usually these traits that are considered to be ”protected” or ”sensitive” attributes in computer science literature. A list for some of these protected attributes is provided in Table 4 as defined by the Fair Housing Act (FHA) and Equal Credit Opportunity Act (ECOA) (chen2019fairness).

  2. Indirect Discrimination. Indirect discrimination refers to a situation where the treatment is based on apparently neutral, non-protected attributes, but still results in unjustified distinctions against individuals from the protected group. (e.g., the residential zip code of individual can be used for making decisions such as granting a loan. Although a zip code is apparently a neutral attribute, it may correlate with race due to the racial composition of residential areas.) Therefore, the use of zip code may indirectly lead to racial discrimination (ijcai2017-549).

  3. Systemic Discrimination. Systemic discrimination refers to policies, customs, or behaviors that are a part of the culture or structure of an organization that may perpetuate discrimination against certain subgroups of the population (unitedequal). For example, a restaurant fulfilling customers’ requests that lead to discriminatory placement of employees would be systemic discrimination. (rivera2012hiring) found that employers overwhelmingly preferred competent candidates that were culturally similar to them personally, and shared similar experiences and hobbies. If the decision-makers happen to belong overwhelmingly to certain subgroups, this may result in discrimination against competent candidates that do not belong to these subgroups.

  4. Statistical Discrimination. Statistical discrimination is a phenomenon where decision-makers use average group statistics to judge an individual belonging to that group. It usually occurs when the decision-makers (e.g., employers, or law enforcement officers) use an individual’s obvious, recognizable characteristics as a proxy for either hidden or more-difficult-to-determine characteristics, that may actually be relevant to the outcome (phelps1972statistical).

  5. Explainable Discrimination. In many real-world cases, if the difference in the decisions can be justified, it is not considered illegal discrimination. A part of the differences in the probability of acceptance for different groups may be objectively explainable by other attributes. Different treatment of sensitive groups can be explainable by other attributes and hence are tolerable (Kamiran2013). For instance, authors in (Kamiran2013) state that in the UCI Adult dataset (Asuncion+Newman:2007), a widely used dataset in the fairness domain, females on average have a lower annual income than males. However, this is because females work fewer hours per week on average than males. Work hours per week gives a good justification for low income and is an attribute that needs to be considered. If we make decisions, without considering working hours, such that males and females average the same income, this will lead to a reverse discrimination as it would result in male employees being assigned a lower salary than females. Therefore, explainable discrimination is acceptable and legal as it can be explained through other attributes like working hours. In (Kamiran2013), authors present a methodology for quantifying the explainable differences in treatment and the illegal discrimination in data. They argue that techniques which do not take into account the explainable part of the discrimination may tend to overshoot, and thus introduce a reverse

    discrimination which is equally undesirable. They explain how to measure discrimination in data or decision output by a classifier by explicitly considering explainable and illegal discrimination.

  6. Unexplainable Discrimination. In contrast to explainable discrimination, there is unexplainable discrimination in which the discrimination towards a group is unjustified and therefore considered illegal. Authors in (Kamiran2013) also present the local techniques that remove exactly the illegal or unexplainable discrimination, allowing the differences in decisions to be present as long as they are explainable. These techniques preprocess the training data in such a way that it no longer contains illegal discrimination. After preprocessing, classifiers that are trained using this data are expected not to capture the illegal discrimination.

4.2. Definitions of Fairness

In (binns2018fairness), authors studied fairness definitions in political philosophy and tried to tie them to machine-learning. Authors in (hutchinson201950) studied the 50-year history of fairness definitions in the areas of education and machine-learning. In (verma2018fairness), authors listed and explained some of the definitions used for fairness in algorithmic classification problems. In (saxena2019fairness), authors studied the general public’s perception of some of these fairness definitions from the computer science literature. Here we will reiterate and provide some of the most widely used definitions, along with their explanations inspired from (verma2018fairness).

Definition 1. (Equalized Odds). A predictor Ŷ satisfies equalized odds with respect to protected attribute A and outcome Y, if Ŷ and A are independent conditional on Y. P(Ŷ=1—A=0,Y =y) = P(Ŷ=1—A=1,Y =y) , y{0,1} (hardt2016equality)

. This means that the probability of a person in the positive class being correctly assigned a positive outcome and the probability of a person in a negative class being incorrectly assigned a positive outcome should both be the same for the protected and unprotected (male and female) group members. In other words, the equalized odds definition states that the protected and unprotected groups should have equal true positive and equal false positive rates.

Definition 2. (Equal Opportunity). A binary predictor Ŷ satisfies equal opportunity with respect to A and Y if P(Ŷ=1—A=0,Y=1) = P(Ŷ=1—A=1,Y=1) (hardt2016equality). This means that the probability of a person in a positive class being assigned to a positive outcome should be equal for both protected and unprotected (female and male) group members. In other words, the equal opportunity definition states that the protected and unprotected groups should have equal true positive rates.

Definition 3. (Demographic Parity). Also known as statistical parity. A predictor Ŷ satisfies demographic parity if P(Ŷ —A = 0) = P(Ŷ—A = 1) (NIPS2017_6995; Dwork:2012:FTA:2090236.2090255). The demographic parity definition states that people in both protected and unprotected (female and male) groups should have equal probability of being assigned to a positive outcome.

Definition 4. (Fairness Through Awareness). An algorithm is fair if it gives similar predictions to similar individuals. In other words, any two individuals who are similar with respect to a similarity (distance) metric defined for a particular task should be classified similarly (NIPS2017_6995; Dwork:2012:FTA:2090236.2090255).

Definition 5. (Fairness Through Unawareness). An algorithm is fair as long as any protected attributes A are not explicitly used in the decision-making process (NIPS2017_6995; grgic2016case).

Definition 6. (Treatment Equality). Treatment equality is achieved when the ratio of false negatives and false positives is the same for both protected group categories (berk2018fairness).

Definition 7. (Test Fairness). A score S = S(x) is testfair (well-calibrated) if it reflects the same likelihood of recidivism irrespective of the individual’s group membership, R. That is, if for all values of s, P(Y =1—S=s,R=b)=P(Y =1—S=s,R=w) (chouldechova2017fair). In other words, the test fairness definition states that for any predicted probability score S, people in both protected and unprotected (female and male) groups should have equal probability to truly belong to the positive class.

Definition 8. (Counterfactual Fairness). Predictor Ŷ is counterfactually fair if under any context X =x and A=a, P((U)=y—X =x,A=a)=P((U)=y—X =x,A=a), (or all y and for any value attainable by A (NIPS2017_6995). The counterfactual fairness definition is based on the intuition that a decision is fair towards an individual if it is the same in both the actual world and a counterfactual world where the individual belonged to a different demographic group.

Definition 9. (Fairness in Relational Domains). A notion of fairness that is able to capture the relational structure in a domain—not only by taking attributes of individuals into consideration but by taking into account the social, organizational, and other connections between individuals (Farnadi:2018:FRD:3278721.3278733).

Definition 10. (Conditional Statistical Parity). For a set of legitimate factors L, predictor Ŷ satisfies conditional statistical parity if P(Ŷ —L=1,A = 0) = P(Ŷ—L=1,A = 1) (corbett2017algorithmic). Conditional statistical parity states that people in both protected and unprotected (female and male) groups should have equal probability of being assigned to a positive outcome given a set of legitimate factors L.

Fairness definitions fall under different types as follows:

  1. Individual Fairness. Give similar predictions to similar individuals (NIPS2017_6995; Dwork:2012:FTA:2090236.2090255).

  2. Group Fairness. Treat different groups equally (NIPS2017_6995; Dwork:2012:FTA:2090236.2090255).

  3. Subgroup Fairness. Subgroup fairness intends to obtain the best properties of the group and individual notions of fairness. It is different than these notions but uses them in order to obtain better outcomes. It picks a group fairness constraint like equalizing false positive and asks that this constraint holds over an exponentially large collection of subgroups (kearns2018preventing; kearns2019empirical).

Name Reference Group Individual
Demographic parity (NIPS2017_6995)(Dwork:2012:FTA:2090236.2090255)
Conditional statistical parity (corbett2017algorithmic)
Equalized odds (hardt2016equality)
Equal opportunity (hardt2016equality)
Fairness through unawareness (NIPS2017_6995)(grgic2016case)
Fairness through awareness (Dwork:2012:FTA:2090236.2090255)
Counterfactual fairness (NIPS2017_6995)
Table 1. Categorizing different fairness notions into group vs. individual types.

It is important to note that according to (kleinberg2016inherent), it is impossible to satisfy some of the fairness constraints at once except in highly constrained special cases. In (kleinberg2016inherent) authors show that the calibration condition and the balance conditions for the positive and negative classes are, in general, incompatible with each other, and they can only be simultaneously satisfied in certain highly constrained cases; therefore, it is important to take the context and application in which fairness definitions need to be used into consideration and use them accordingly (selbst2019fairness). Another important aspect to consider is time and temporal analysis of the impacts that these definitions may have on individuals or groups. In (liu2018delayed) authors show that current fairness definitions are not always helpful and do not promote improvement for sensitive groups—and can actually be harmful when analyzed over time in some cases. They also show that measurement errors can also act in favor of these fairness definitions; therefore, they highlight the importance of measurement and temporal modeling in the evaluation of fairness criteria, suggesting a range of new challenges and trade-offs. It is also important to pay attention to the sources of bias and their types when trying to solve fairness-related questions.

5. Methods for Fair Machine Learning

There have been numerous attempts to address bias in artificial intelligence in order to achieve fairness; these stem from domains of AI. In this section we will enumerate different domains of AI, and the work that has been produced by each community to combat bias and unfairness in their methods. Table 2 provides an overview of the different domains and sub-domains that we focus upon in this survey.

Domain Sub-domain Reference(s)
Data Simpson’s Paradox (kievit2013simpson)
Machine learning Classification (kamishima2012fairness) (pmlr-v81-menon18a) (goel2018non) (Krasanakis:2018:ASR:3178876.3186133) (pmlr-v97-ustun19a) (hardt2016equality) (zafar2015fairness) (woodworth2017learning) (huang2019stable) (calders2010three) (wu2018fairnessaware)
Machine learning Regression (berk2017convex) (agarwal2019fair)
Machine learning PCA (Samadi:2018:PFP:3327546.3327755)
Machine learning Community detection (mehrabi2019debiasing)
Machine learning Clustering (chen2019proportionally) (pmlr-v97-backurs19a)
Machine learning Graph embedding (bose2019compositional)
Machine learning Causal inference (loftus2018causal) (ijcai2017-549) (8477109) (Zhang2017) (nabi2018fair) (nabi2018learning) (Zhang:2016:STD:3060832.3061001) (kilbertus2017avoiding) (qureshi2016causal) (10.1007/978-3-319-39931-7_9)
Natural language processing Word embedding (bolukbasi2016man) (zhao2018learning) (gonen2019lipstick) (pmlr-v97-brunet19a) (zhao2019gender)
Natural language processing Coreference resolution (zhao2018gender) (rudinger-etal-2018-gender)
Natural language processing Language model (bordia2019identifying)
Natural language processing Sentence embedding (may2019measuring)
Natural language processing Machine translation (font2019equalizing)
Natural language processing Semantic role labeling (zhao2017men)
Deep learning / representation learning Variational auto encoders (louizos2016variational) (amini2019uncovering) (moyer2018invariant) (creager2019flexibly)
Deep learning / representation learning Adversarial learning (lemoine2018mitigating) (xu2018fairgan)
Table 2. List of papers targeting and talking about bias and fairness in different areas and sub-areas of machine learning.

While this section is largely domain-specific, it can be useful to take a cross-domain view. Generally, methods that target biases in the algorithms fall under three categories:

  1. Pre-processing. Pre-processing techniques attack the problem by removing the underlying discrimination from the data before any modeling (d2017conscientious). If the algorithm is allowed to modify the training data, then pre-processing can be used (bellamy2018ai).

  2. In-processing. In-processing techniques can be considered modifications of traditional learning algorithms to address discrimination during the model training phase (d2017conscientious). If it is allowed to change the learning procedure for a machine learning model, then in-processing can be used during the training of a model— either as a constraint or incorporated into the objective function (bellamy2018ai; berk2017convex).

  3. Post-processing. Post-processing is the final class of methods and can be performed post-training. It relies on access to a holdout set that was not involved in the model’s training phase (d2017conscientious). If the algorithm can only treat the learned model as a black box without any ability to modify the training data or learning algorithm, then only post-processing can be used where data is labeled by some black-box model and then relabeled as a function only of the original labels (bellamy2018ai; berk2017convex).

Examples of some existing work and their categorization into these types is shown in Table 3. These methods are not just limited to general machine learning techniques, but because of AI’s popularity, they have expanded to different domains in natural language processing and deep learning. From learning fair representations (creager2019flexibly; louizos2016variational; moyer2018invariant) to learning fair word embeddings (zhao2018learning; bolukbasi2016man; gonen2019lipstick), debiasing methods have been proposed in different AI applications and domains. Most of these methods try to avoid unethical interference of sensitive or protected attributes into the decision-making process, while others target exclusion bias by trying to include users from sensitive groups. In addition, some works try to satisfy one or more of the fairness notions in their methods, such as disparate learning processes (DLPs) which try to satisfy notions of treatment disparity and impact disparity by allowing the protected attributes during the training phase but avoiding them during prediction time (lipton2017does). A list of protected or sensitive attributes is provided in Table 4. They point out what attributes should not affect the outcome of the decision in housing loan or credit card decision-making (chen2019fairness) according to the law. Some of the existing work tries to treat sensitive attributes as noise to disregard their effect on the decision-making, while some causal methods try to use causal graphs, and disregard some paths in the causal graph that result in sensitive attributes affecting the outcome of the decision. Different bias-mitigating methods and techniques are discussed below for different domains—each targeting a different problem in different areas of machine learning in detail. This can expand the horizon of the reader on where and how bias can affect the system and try to help researchers carefully look at various new problems concerning potential places where discrimination and bias can affect the outcome of a system.

5.1. Bias Mitigation

In order to mitigate the effects of bias in data, some general methods have been proposed that advocate having good practices while using data, such as having datasheets that would act like a supporting document for the data reporting how a dataset was created, what characteristics, motivations, and potential skews it represents (gebrudatasheets; benjamintowards). (bender-friedman-2018-data) proposes a similar approach for the NLP applications. A similar suggestion has been proposed for models in (Mitchell:2019:MCM:3287560.3287596). Authors in (holland2018dataset) also propose having labels, just like nutrition labels on food, in order to better categorize each data for each task. In addition to these general techniques, some work has targeted more specific types of biases. For example, (kievit2013simpson) has proposed methods to test for the existence of Simpson’s paradox in the data, and (alipourfard2018wsdm; alipourfard2018icwsm) proposed methods for automatically discovering Simpson’s paradoxes in data.

Algorithm Reference Pre-Processing In-Processing Post-Processing
Community detection (mehrabi2019debiasing)
Word embedding (pmlr-v97-brunet19a)
Optimized pre-processing (NIPS2017_6988)
Data pre-processing (Kamiran2012)
Classification (zafar2015fairness)
Regression (berk2017convex)
Classification (kamishima2012fairness)
Classification (wu2018fairnessaware)
Adversarial learning (lemoine2018mitigating)
Classification (hardt2016equality)
Word embedding (bolukbasi2016man)
Classification (NIPS2017_7151)
Table 3. Algorithms categorized into their appropriate groups based on being pre-processing, in-processing, or post-processing.
Attribute FHA ECOA
National origin
Familial status
Exercised rights under CCPA
Marital status
Recipient of public assistance
Table 4. A list of the protected attributes as defined by the Fair Housing Act (FHA) and Equal Credit Opportunity Act (ECOA) (chen2019fairness).

Causal models and graphs were also used in some work to detect direct discrimination in the data along with its prevention technique that modifies the data such that the predictions would be absent from direct discrimination (zhang2017achieving). (6175897) also worked on discrimination prevention in data mining, targeting direct, indirect, and simultaneous effects.

5.2. Fair Machine Learning

Algorithms trained on biased data can discriminate against protected classes in their predictions, violating fairness. To address this issue, a variety of methods have been proposed that satisfy some of the fairness definitions or other new definitions depending on the application.

5.2.1. Fair Classification

Since classification is a serious task in machine learning and is widely used in different areas that can be in direct contact with humans, it is important that these types of methods be fair and be absent from biases that can harm some populations. Therefore, certain methods have been proposed (kamishima2012fairness; pmlr-v81-menon18a; goel2018non; Krasanakis:2018:ASR:3178876.3186133) that satisfy certain definitions of fairness. For instance, in (pmlr-v97-ustun19a) authors try to satisfy the subgroup fairness in the classification task, equality of opportunity and equalized odds in (hardt2016equality), both disparate treatment and disparate impact in (zafar2015fairness), and equalized odds in (woodworth2017learning). Other methods try to not only satisfy some fairness constraints but to also be stable toward change in the test set (huang2019stable)

. Other works tried to propose three different modifications to the existing Naive Bayes classifier for discrimination-free classification

(calders2010three). The authors in (wu2018fairnessaware), propose a general framework for learning fair classifiers. This framework can be used for formulating fairness-aware classification with fairness guarantees.

5.2.2. Fair Regression

(berk2017convex) proposes a fair regression method along with evaluating it with a measure introduced as the ”price of fairness” (POF) to measure accuracy-fairness trade-offs. They introduce three fairness penalties as follows:

Individual Fairness: For every cross pair , , a model is penalized for how differently it treats and (weighted by a function of ) where and are different groups from the sampled population.

Group Fairness: On average, the two groups’ instances should have similar labels (weighted by the nearness of the labels of the instances).

Hybrid Fairness: Hybrid fairness requires both positive and both negatively labeled cross pairs to be treated similarly in an average over the two groups.

In addition to the previous work, (agarwal2019fair) considers the fair regression problem formulation with regards to two notions of fairness statistical (demographic) parity and bounded group loss.

5.2.3. Structured Prediction

In (zhao2017men), authors studied the semantic role-labeling models and a famous dataset, imSitu, and realized that in the imSitu training set, 33% of cooking images have man in the agent role, while the rest have woman. They also noticed that in addition to the exiting bias in the dataset, the model would amplify the bias such that after training a conditional random field (CRF), bias would be amplified by man filling 16% of agent roles in cooking images. Under these observations, the authors of the paper (zhao2017men) show that structured prediction models have the risk of leveraging social bias. Therefore, they propose a calibration algorithm called RBA (reducing bias amplification); RBA is a debiasing technique for calibrating prediction in structured prediction models. The idea behind RBA is to force constraints to ensure that the model predictions follow the same distribution in the training data. They study two cases: visual semantic role labeling and multi-label object classification. They show how these methods exemplify the existing bias in data.

5.2.4. Fair PCA

In (Samadi:2018:PFP:3327546.3327755) authors show that vanilla PCA can exaggerate the reconstruction error for one group of people over a different group of equal size, so they propose a fair method to create representations with similar richness for different populations—not to make them indistinguishable, or to hide dependence on a sensitive or protected attribute. They show that vanilla PCA on the labeled faces in the wild dataset (LFW) has a higher reconstruction error for women than for men, even if male and female faces are sampled with equal weight. They intend to introduce a dimensionality reduction technique which maintains similar fidelity for different groups and populations in the dataset. Therefore, they introduce Fair PCA and define a fair dimensionality reduction algorithm. Their definition of Fair PCA (as an optimization function) is as follows, in which and denote two subgroups, and are matrices with rows corresponding to rows of for groups and respectively given data points in :

And their proposed algorithm is a two-step process listed below:

  1. Relax Fair PCA to a semidefinite optimization problem and solve SDP.

  2. Solve an LP designed to reduce the rank of said solution.

5.2.5. Community Detection/Graph Embedding/Clustering

Inequalities in online communities and social networks can also potentially be another place where bias and discrimination can affect the populations. For example, in online communities users with a fewer number of friends or followers face a disadvantage of being heard in online social media (mehrabi2019debiasing). In addition, existing methods, such as community detection methods, can amplify this bias by ignoring these low-connected users in the network or by wrongfully assigning them to the irrelevant and small communities. In (mehrabi2019debiasing) authors show how this type of bias exists and is perpetuated by the existing community detection methods. They propose a new attributed community detection method, called CLAN, to mitigate the harm toward disadvantaged groups in online social communities. CLAN is a two-step process that considers node attributes in addition to the network structure to address exclusion bias, as indicated below:

  1. Detect communities using modularity values (Step 1-unsupervised using only network structure).

  2. Train a classifier to classify users in the minor groups, putting them into one of the major groups using held-out node attributes (Step 2-supervised using other node attributes).

Fair methods in domains similar to community detection are also proposed, such as graph embedding (bose2019compositional) and clustering (chen2019proportionally; pmlr-v97-backurs19a).

5.2.6. Fair Causal Inference

Many researchers have used causal models and graphs to solve fairness-related concerns in machine learning. In (loftus2018causal), authors discuss in detail the subject of causality and its importance while designing fair algorithms. There has been much research on discrimination discovery and removal that uses causal models and graphs in order to make decisions that are irrespective of sensitive attributes of groups or individuals. For instance, in (ijcai2017-549) authors propose a causal-based framework that detects direct and indirect discrimination in the data along with their removal techniques. (8477109) is an extension to the previous work. (Zhang2017) gives a nice overview of most of the previous work done in this area by the authors, along with discussing system-, group-, and individual-level discrimination and solving each using their previous methods, in addition to targeting direct and indirect discrimination. By expanding on the previous work and generalizing it, authors in (nabi2018fair) propose a similar pathway approach for fair inference using causal graphs; this would restrict certain problematic and discriminative pathways in the causal graph flexibly given any set of constraints—as long as the path-specific effects are identifiable from the observed distribution. In (nabi2018learning) authors extended a formalization of algorithmic fairness from their previous work to the setting of learning optimal policies under fairness constraints. They describe several strategies for learning optimal policies by modifying some of the existing strategies, such as Q-learning, value search, and G-estimation, based on some fairness considerations. In (Zhang:2016:STD:3060832.3061001) authors only target discrimination discovery and no removal by finding instances similar to another instance and observing if a change in the protected attribute will change the outcome of the decision. If so, they declare the existence of discrimination. In (kilbertus2017avoiding), authors define the following two notions of discrimination—unresolved discrimination and proxy discrimination—as follows:
Unresolved Discrimination: A variable V in a causal graph exhibits unresolved discrimination if there exists a directed path from A to V that is not blocked by a resolving variable, and V itself is non-resolving.
Proxy Discrimination: A variable V in a causal graph exhibits potential proxy discrimination, if there exists a directed path from A to V that is blocked by a proxy variable and V itself is not a proxy. They proposed methods to prevent and avoid them. They also show that no observational criterion can determine whether a predictor exhibits unresolved discrimination; therefore, a causal reasoning framework needs to be incorporated.
In (qureshi2016causal), Instead of using the usual risk difference , authors propose a causal risk difference for causal discrimination discovery. They define to be:

is not close to zero means that there is a bias in decision value due to group membership (causal discrimination) or to covariates that have not been accounted for in the analysis (omitted variable bias). This then becomes their causal discrimination measure for discrimination discovery. (10.1007/978-3-319-39931-7_9) is another work of this type that uses causal networks for discrimination discovery.

5.3. Fair Representation Learning

5.3.1. Variational Auto Encoders

Learning fair representations and avoiding the unfair interference of sensitive attributes has been introduced in many different research papers. A well-known example is the Variational Fair Autoencoder introduced in

(louizos2016variational). Here,they treat the sensitive variable as the nuisance variable, so that by removing the information about this variable they will get a fair representation. They use a maximum mean discrepancy regularizer in order to further promote invariance in the posterior distribution over latent variables. Adding this MMD penalty into the lower bound of their VAE architecture satisfies their proposed model for the Variational Fair Autoencoder. Similar work, but not targeting fairness specifically, has been introduced in (jaiswal2018unsupervised). In (amini2019uncovering) authors also propose a debiased VAE architecture called DB-VAE which learns sensitive latent variables that can bias the model (e.g., skin tone, gender, etc.) and propose an algorithm on top of this DB-VAE using these latent variables to debias systems like facial detection systems. In (moyer2018invariant) authors model their representation-learning task as an optimization objective that would minimize the loss of the mutual information between the encoding and the sensitive variable. The relaxed version of this assumption is shown in Equation 1. They use this in order to learn fair representation and show that adversarial training is unnecessary and in some cases even counter-productive. In Equation 1, c is the sensitive variable and z the encoding of x.


In (creager2019flexibly), authors introduce flexibly fair representation learning by disentanglement that disentangles information from multiple sensitive attributes. Their flexible and fair variational authoencoder is not only flexible with respect to downstream task labels but also flexible with respect to sensitive attributes. They address the demographic parity notion of fairness, which can target multiple sensitive attributes or any subset combination of them.

5.3.2. Adversarial Learning

In (lemoine2018mitigating) authors present a framework to mitigate bias in models learned from data with stereotypical associations. They propose a model in which they are trying to maximize the predictor’s ability to predict y, while minimizing an adversary’s ability to predict the protected or sensitive variable (stereotyping variable z). The model consists of two parts—the predictor and the adversary—as shown in Figure 6. In their model, the predictor is trained to accomplish the task of predicting Y given X and is trained by attempting to modify weights W to minimize some loss LP(

, y), using a gradient-based method such as stochastic gradient descent. The output layer of the predictor is then used as an input to another network, termed the adversary, which attempts to predict Z. The adversary may have different inputs depending on the fairness definition needing to be achieved. For instance, in order to achieve

Demographic Parity, the adversary gets the predicted label which allows the adversary to try to predict the protected variable using nothing but the predicted label. The goal of the predictor is to prevent the adversary from doing this. Similarly, to achieve Equality of Odds, the adversary gets and the true label Y. For Equality of Opportunity on a given class y, they restrict the training set of the adversary to training examples where Y=y. (xu2018fairgan) takes an interesting and different direction toward solving fairness issues using adversarial networks by introducing FairGAN which generates synthetic data that is free from discrimination and is similar to the real data. They use their newly generated synthetic data from FairGAN, which is now debiased, instead of the real data for training and testing. They do not try to remove discrimination from the dataset, unlike many of the existing approaches, but instead generate new datasets similar to the real one which is debiased and preserves good data utility. The architecture of their FairGAN model is shown in Figure 5. FairGAN consists of one generator which generates the fake data conditioned on the protected attribute where , and two discriminators and . are trained to distinguish the real data denoted by from the generated fake data denoted by . In addition to that, for achieving fairness constraints, such as statistical parity, , is trained to distinguish the two categories of generated samples and indicating if the generated samples are from the protected group or the unprotected group. Here s denotes the protected or the sensitive variable.

Figure 5. Structure of FairGAN as proposed in (xu2018fairgan).
Figure 6. The architecture of adversarial network proposed in (lemoine2018mitigating).

5.4. Fair NLP

5.4.1. Word Embedding

In (bolukbasi2016man) authors noticed that while using state-of-the-art word embeddings in word analogy tests, man would be mapped to computer programmer and woman to homemaker. This bias toward woman triggered the authors to propose a method to debias word embeddings by proposing a method that respects the embeddings for gender-specific words but debiases embeddings for gender-neutral words by following these steps: (Notice that Step 2 has two different options. Depending on whether you target hard debiasing or soft debiasing, you would use either step 2a or 2b)

  1. Identify gender subspace. Identifying a direction of the embedding that captures the bias.

  2. Hard debiasing or soft debiasing:

    1. Hard debiasing (neutralize and equalize). Neutralize ensures that gender-neutral words are zero in the gender subspace. Equalize ensures that sets of words are perfectly equalized outside the subspace and thereby enforces the property that any neutral word is equidistant to all words in each equality set.

    2. Soft bias correction. Aims at minimizing the projection of the gender-neutral words onto the gender subspace, while maintaining as much similarity to the original embedding as possible, with a parameter that controls this trade-off.

Following on the footsteps of these authors, other future work attempted to tackle this problem (zhao2018learning)

by generating a gender-neutral version of (Glove called GN-Glove) that aims to preserve gender information in some directions of word vectors, while setting other dimensions free from gender influence. They use Glove as the base model and gender as the protected attribute. However, a recent paper

(gonen2019lipstick) argues against these debiasing techniques and states that many recent works on debiasing word embeddings have been superficial, that those techniques just hide the bias and don’t actually remove it. A recent work (pmlr-v97-brunet19a) took a new direction and proposed a preprocessing method for the discovery of the problematic documents in the training corpus that have biases in them, and tried to debias the system by perturbing or removing these documents efficiently from the training corpus. In a very recent work (zhao2019gender), authors target bias in ELMo’s contextualized word vectors and attempt to analyze and mitigate the observed bias in the embeddings. They show that the corpus used for training, known as ELMo, has a significant gender skew, with male entities being nearly three times more common than female entities. This automatically leads to gender bias in these pretrained contextualized embeddings. They propose the following two methods for mitigating the existing bias while using the pretrained embeddings in a downstream task, coreference resolution: (1) train-time data augmentation approach, and (2) test-time neutralization approach.

5.4.2. Coreference Resolution

The (zhao2018gender) paper shows that correference systems have a gender bias. They introduce a benchmark focusing on gender bias in coreference resolution called WinoBias. In addition to that, they introduce a data-augmentation technique that removes bias in the existing state-of-the-art coreferencing methods, in combination with using word2vec debiasing techniques. Their general approach is as follows: They first generate auxiliary datasets where all male entities are replaced by female entities and vice versa using a rule-based approach. Then they train models on the union of the original and auxiliary datasets. They use the above solution in combination with word2vec debiasing techniques to generate word embeddings. They also point out sources of gender bias in coreference systems and propose solutions to them. They show that the first source of bias comes from the training data and propose a solution that generates an auxiliary data set by swapping male and female entities. Another case arises from the resource bias (word embeddings are bias), so the proposed solution is to replace Glove with a debiased embedding method. Last, another source of bias can come from unbalanced gender lists, and balancing the counts in the lists is a solution they proposed. In another work (rudinger-etal-2018-gender), authors also show the existence of gender bias in three state-of-the-art coreference resolution systems by observing that for many occupations, these systems strongly prefer to resolve pronouns of one gender over another.

5.4.3. Language Model

In (bordia2019identifying)

authors propose a metric to measure gender bias in a text corpus and the text generated from a recurrent neural network language model trained on the text corpus. They use Equation


, where w is any word in the corpus, f is a set of gendered words that belong to the female category, such as she, her, woman, etc., and m to the male category, and measure the bias using the mean absolute and standard deviation of the proposed metric along with fitting a univariate linear regression model over it and then analyzing the effectiveness of each of those metrics while measuring the bias.


They also propose a regularization loss term for the language model that minimizes the projection of encoder-trained embeddings onto an embedding subspace that encodes gender following the soft debiasing technique introduced in (bolukbasi2016man). Finally, they evaluate the effectiveness of their method on reducing gender bias and conclude by stating that in order to reduce bias, there is a compromise on perplexity. They also point out the effectiveness of word-level bias metrics over the corpus-level metrics.

5.4.4. Sentence Encoder

In (may2019measuring) authors extend the research in detecting bias in word embedding techniques to that of sentence embedding. They try to generalize bias-measuring techniques, such as using the Word Embedding Association Test (WEAT (caliskan2017semantics)) in the context of sentence encoders by introducing their new sentence encoding bias-measuring techniques, and the Sentence Encoder Association Test (SEAT). They used state-of-the-art sentence encoding techniques, such as CBoW, GPT, ELMo, and BERT, and find that although there was varying evidence of human-like bias in sentence encoders using SEAT, more recent methods like BERT are more immune to biases. That being said, they are not claiming that these models are bias-free, but state that more sophisticated bias discovery techniques may be used in these cases, thereby encouraging more future work in this area.

5.4.5. Machine Translation

In (font2019equalizing) authors noticed that when translating the word ”friend” in the following two sentences from English to Spanish, they achieved different results—although in both cases this word should be translated the same way.
”She works in a hospital, my friend is a nurse.”
”She works in a hospital, my friend is a doctor.”
In both of these sentences, ”friend” should be translated to the female version of Spanish friend ”amiga,” but the results were not reflecting this expectation. For the second sentence, friend was translated to ”amigo,”—the male version of friend in Spanish. This is because doctor is more stereotypical to males and nurse to females, and the model picks this bias or stereotype and reflects it in its performance. To solve this, authors in (font2019equalizing) take advantage of the fact that word embeddings are used in machine translation. They use the existing debiasing methods in word embedding and apply them in the machine translation pipeline. This not only helped them to mitigate the existing bias in their system, but also boosted the performance of their system by one BLUE score. In (prates2018assessing) authors show that Google’s translate system can suffer from gender bias by making sentences containing U.S. Bureau of Labor Statistics in 12 different gender-neutral languages such as Hungarian, Chinese, and Yoruba, translating them into English, and showing that Google Translate exhibits a strong tendency toward males for stereotypical fields such as STEM jobs. In (vanmassenhove2018getting) authors annotated and analyzed the Europarl dataset (koehn2005europarl), a large political, multilingual dataset used in machine translation, and discovered that with the exception of the youngest age group (20-30), which represents only a very small percentage of the total amount of sentences (0.71%), more male data is available in all age groups. They also looked at the entire dataset and showed that 67.39% of the sentences are produced by male speakers. Furthermore, to mitigate the gender-related issues and to improve morphological agreement in machine translation, they augmented every sentence with a tag on the English source side, identifying the gender of the speaker. This helped the system in most of the cases, but not always, so further work has been suggested for integrating speaker information in other ways.

6. Challenges and Opportunities for Fairness Research

While there have been many definitions of, and approaches to, fairness in the literature, the study in this area is anything but complete. Fairness and algorithmic bias still holds a number of research opportunities. In this section, we provide pointers to outstanding challenges in fairness research, and an overview of opportunities for development of understudied problems.

6.1. Challenges

There are several remaining challenges to be addressed in the fairness literature. Among them are:

  1. Synthesizing a definition of fairness. Several definitions of what would constitute fairness from a machine learning perspective have been proposed in the literature. These definitions cover a wide range of use cases, and as a result are somewhat disparate in their view of fairness. Because of this, it is nearly impossible to understand how one fairness solution would fare under a different definition of fairness. Synthesizing these definitions into one remains an open research problem.

  2. From Equality to Equity. The definitions presented in the literature mostly focus on equality, ensuring that each individual or group is given the same amount of resources, attention or outcome. However, little attention has been paid to equity, which is the concept that each individual or group is given the resources they need to succeed (gooden2015race). Operationalizing this definition and studying how it augments or contradicts existing definitions of fairness remains an exciting future direction.

  3. Searching for Unfairness. Given a definition of fairness, it should be possible to identify instances of this unfairness in a particular dataset. Inroads toward this problem have been made in the areas of data bias by detecting instances of Simpson’s Paradox in arbitrary datasets (alipourfard2018wsdm); however, unfairness may require more consideration due to the variety of definitions and the nuances in detecting each one.

Figure 7. Heatmap depicting distribution of previous work in fairness, grouped by domain and fairness definition.

6.2. Opportunities

In this work we have taxonomized and summarized the current state of research into algorithmic biases and fairness—with a particular focus on machine learning. Even in this area alone, the research is broad. Subareas, from natural language processing, to representation learning, to community detection, have all seen efforts to make their methodologies more fair. Nevertheless, every area has not received the same amount of attention from the research community. Figure 7 provides an overview of what has been done in different areas to address fairness—categorized by the fairness definition type and domain. Some areas (e.g., community detection at the subgroup level) have received no attention in the literature, and could be fertile future research areas.

7. Conclusion

In this survey we introduced problems that can adversely affect AI systems in terms of bias and unfairness. The issues were viewed primarily from two dimensions: data and algorithms. We illustrated some of the problems and showed why considering fairness is an important issue, along with providing some of the potential real-world harm that unfairness can have on society—such as applications in judicial systems, face recognition, and promoting algorithms. We then went over the definitions of fairness and bias that have been proposed by researchers. To further stimulate the interest of readers, we provided some of the work done in different areas in terms of addressing the biases that may affect AI systems and different methods and domains in AI, such as general machine learning, deep learning and natural language processing. We then further subdivided the fields into a more fine-grained analysis of each subdomain and the work being done to address fairness constraints in each. The hope is to expand the horizons of the readers to think deeply while working on a system or a method to ensure that it has a low likelihood of causing potential harm or bias toward a particular group. With the expansion of AI use in our world, it is important that researchers take this issue seriously and expand their knowledge in this field. In this survey we categorized and created a taxonomy of what has been done so far to address different issues in different domains regarding the fairness issue. Other possible future work and directions can be taken to address the existing problems and biases in AI that we discussed in the previous sections.

8. Acknowledgments

This material is based upon work supported by the Defense Advanced Research Projects Agency (DARPA) under Agreement No. HR0011890019. We would like to thank Mila and IVADO for their summer school on Bias and Discrimination in AI and all the speakers. Special Thank to Golnoosh Farnadi and Jihane Lamouri for organizing the event.

9. Appendix

9.1. Datasets for Fairness Research

Aside from the existence of bias in datasets, there are datasets that are specifically used to address bias and fairness issues in machine learning. There are also some datasets that are introduced to target the issues and biases previously observed in older existing datasets. Below we list some of the widely known datasets that have the characteristics discussed in this survey.

9.1.1. UCI Adult Dataset

UCI Adult dataset, also known as ”Census Income” dataset, contains information,extracted from the 1994 census data about people with attributes such as age, occupation, education, race, sex, marital-status, native-country, hours-per-week etc., indicating whether the income of a person exceeds $50K/yr or not. It can be used in fairness-related studies that want to compare gender or race inequalities based on people’s annual incomes, or various other studies (Asuncion+Newman:2007).

9.1.2. German Credit Dataset

The German Credit dataset contains 1000 credit records containing attributes such as personal status and sex, credit score, credit amount, housing status etc. It can be used in studies about gender inequalities on credit-related issues (Dua:2019).

9.1.3. WinoBias

The WinoBias dataset follows the winograd format and contains references to people using a vocabulary of 40 occupations. It contains two types of challenge sentences that require linking gendered pronouns to either male or female stereotypical occupations. It was used in the coreference resolution study to certify if a system has gender bias or not—in this case, towards stereotypical occupations (zhao2018gender).

9.1.4. Communities and Crime Dataset

The Communities and Crime dataset gathers information from different communities in the United States related to several factors that can highly influence some common crimes such as robberies, murders or rapes. The data combines socio-economic data from the 1990 US Census, law enforcement data from the 1990 US LEMAS survey, and crime data from the 1995 FBI UCR (redmond2011communities).

9.1.5. COMPAS Dataset

The COMPAS dataset contains the criminal history, jail and prison time, demographics and COMPAS risk scores for defendants from Broward County from 2013 and 2014 (larson2016compas).

9.1.6. Recidivism in Juvenile Justice Dataset

The Recidivism in Juvenile Justice dataset contains all juvenile offenders who, in 2010, finished a sentence in the juvenile justice system of Catalonia. The corresponding crimes were committed between 2002 and 2010 when the offenders were aged 12-17 years (tolan2019machine).

9.1.7. Pilot Parliaments Benchmark Dataset

The Pilot Parliaments Benchmark dataset, also known as PPB, contains images of 1270 individuals from three African countries (Rwanda, Senegal, South Africa) and three European countries (Iceland, Finland, Sweden) selected for gender parity in the national parliaments. This benchmark was released to achieve better intersectional representation on the basis of gender and skin type (pmlr-v81-buolamwini18a).

9.1.8. Diversity in Faces Dataset

The Diversity in Faces (DiF) is a large and diverse data set designed to advance the study of fairness and accuracy in face recognition technology. DiF provides a data set of annotations for one million face images. The DiF data set provides a comprehensive set of annotations of intrinsic facial features that includes craniofacial distances, areas and ratios, facial symmetry and contrast, skin color, age and gender predictions, subjective annotations, and pose and resolution (merler2019diversity).

Dataset Name Reference Size Area
UCI adult dataset (Asuncion+Newman:2007) 48,842 income records Social
German credit dataset (Dua:2019) 1,000 credit records Financial
Pilot parliaments benchmark dataset (pmlr-v81-buolamwini18a) 1,270 images Facial images
WinoBias (zhao2018gender) 3,160 sentences Coreference resolution
Communities and crime dataset (redmond2011communities) 1,994 crime records Social
COMPAS Dataset (larson2016compas) 18,610 crime records Social
Recidivism in juvenile justice dataset (capdevila2005reincidencia) 4,753 crime records Social
Diversity in faces dataset (merler2019diversity) 1 million images Facial images
Table 5. Most widely used datasets in the fairness domain with additional information about each of the datasets including their size and area of concentration.