No computation without representation: Avoiding data and algorithm biases through diversity

02/26/2020 ∙ by Caitlin Kuhlman, et al. ∙ 0

The emergence and growth of research on issues of ethics in AI, and in particular algorithmic fairness, has roots in an essential observation that structural inequalities in society are reflected in the data used to train predictive models and in the design of objective functions. While research aiming to mitigate these issues is inherently interdisciplinary, the design of unbiased algorithms and fair socio-technical systems are key desired outcomes which depend on practitioners from the fields of data science and computing. However, these computing fields broadly also suffer from the same under-representation issues that are found in the datasets we analyze. This disconnect affects the design of both the desired outcomes and metrics by which we measure success. If the ethical AI research community accepts this, we tacitly endorse the status quo and contradict the goals of non-discrimination and equity which work on algorithmic fairness, accountability, and transparency seeks to address. Therefore, we advocate in this work for diversifying computing as a core priority of the field and our efforts to achieve ethical AI practices. We draw connections between the lack of diversity within academic and professional computing fields and the type and breadth of the biases encountered in datasets, machine learning models, problem formulations, and interpretation of results. Examining the current fairness/ethics in AI literature, we highlight cases where this lack of diverse perspectives has been foundational to the inequity in treatment of underrepresented and protected group data. We also look to other professional communities, such as in law and health, where disparities have been reduced both in the educational diversity of trainees and among professional practices. We use these lessons to develop recommendations that provide concrete steps for the computing community to increase diversity.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1. Introduction

The pervasive use of automated technologies in our society has prompted concerns regarding the fair and ethical use of large scale demographic data sets to make decisions that impact people’s lives, particularly in legally regulated domains such as criminal justice, education, housing, and healthcare (Angwin et al., 2016; Barocas and Selbst, 2016). Along with this new paradigm come opportunities to use data analysis that is accurate, reproducible, and transparent to address societal issues. Many sources of data that reflect disparities in social outcomes come from populations that are identified as either vulnerable or underrepresented. Here, vulnerable populations are defined as those lacking the social capital to represent themselves including children, incarcerated persons, students, and the economically alienated/poor (Shivayogi, 2013). Underrepresented groups, by contrast, are defined as individuals derived from ethnic minority populations or gender groups that have undergone historical discrimination and, also, as we will highlight further in this paper, are underrepresented with respect to their participation in the technology workforce (Inc. and Inc., [n.d.])

. Data from these vulnerable and underrepresented groups show that they continue to be subject to systemic structural biases, often manifesting at data collection, that can skew the outcome of automated decision making processes.

Indeed, the interdisciplinary community that has recently arisen to address these biases in algorithmic design and deployment has made great strides in identifying unfairness and working to address it from a computational perspective. A current limitation of the community’s work is that it has not yet succeeded in fully capturing the diverse perspectives of those populations most affected by potentially biased algorithmic systems. We argue this challenges the exact problems that much of the community targets in its research output. To address these gaps, some conversation in technology communities has centered the idea of educational pipeline development as the area in which the greatest strides in ameliorating the diversity deficit can be realized

(P. Wesley Schultz and Serpe, 2013). While we appreciate the role that recruitment of underrepresented groups plays in broadening the field, we think that this approach critically under-utilizes potential diversity resources.

Thus in this work we advocate for diversifying computing and the AI research community itself as a core priority of the field and our research efforts. We limit our discussion mainly to applications of algorithmic fairness research within the social construct of North America. We acknowledge that the these challenges extend beyond the borders of the U.S., but given that us authors are based in the U.S., and the U.S. represents a relevant test case for diversity, this is most appropriate, and the main themes from this discussion are applicable and can be extended to other places. To encourage and facilitate discussion and innovation around this goal we make the following contributions:

  1. We make clear the connection between the lack of diversity of communities represented in datasets and the type and breadth of the biases encountered in our data analysis, and the interpretation thereof.

  2. We highlight recent research from the ethics in AI community which illustrates cases where a lack of diversity of perspectives may have been a critical factor in the design of models and methods which suffer from unfair bias against protected groups.

  3. We identify positive efforts of the ethics in AI community with respect to the diversity deficit in computer science, while making recommendations that are informed by the best practices of fields external to computing.

2. Why Improving Diversity is Essential to the Ethics in AI Community

Bias in data and algorithms are critical issues, and efforts to address these are essential as computing researchers and practitioners design models and algorithms that are being deployed in ever more real-world scenarios. Much scholarship within the ethics in AI community addresses unfair practices against members of vulnerable or underrepresented groups, including the explicit use of protected data attributes such as age, gender, or race or ethnicity, as well as indirect discrimination that occurs when group status is exploited inadvertently (Feldman et al., 2015).

Bias in data may occur when there is unequal representation of protected groups. Algorithms then trained on datasets encoding such biases can result in biased performance across groups (Caliskan et al., 2017; Chen et al., 2018). Additionally, even when datasets are equally representative of groups, biases in objective functions, for example optimizing for an outcome that can be driven by features of protected classes, can also result in unfair outcomes (Obermeyer and Mullainathan, 2019). However, even if these two issues are addressed, there certainly are other systematic issues that can pervade. Here we formally articulate the reason that systematic issues critically impact the work of detecting and mitigating unfair bias in algorithmic systems.

Existing research has proposed many statistical “fairness” criteria. To a first approximation, most of these criteria fall into three different categories defined along the lines of different (conditional) independence between the random variables of the sensitive attribute

, the target variable

, and the classifier or score

; independence, separation and sufficiency (Barocas et al., 2019). Accordingly, being based on , and , these notions do not incorporate any context that may result in or perpetuate such inequalities. To further illustrate this, we examine a popular notion of discrimination defined as statistical parity (Dwork et al., 2012), also referred to as disparate impact (Feldman et al., 2015). This notion requires that a certain group-conditional beneficial outcome rate should be the same for groups of interest. Formally, bias given as the following, should be minimized:

(1)

where is the predictor, representing a random variable that depends on , and . Here represents the group status associated with an individual, defined by some protected attributes which must not be discriminated against. represents other observable attributes of any particular individual, the set of attributes which are not observed, and as above, the outcome to be predicted, e.g. by a machine learning algorithm. While the omission of context can be considered a limitation of the above notions, there is a possibility that by doing so, this may remove attention from, or camouflage the (broader/multilevel/structural) causes of such inequities which can hinder their pursuit and limit sustainable equity. Following this, we formally identify the challenge of structural inequality (Stolte and Emerson, 1977), and define it in line with this notation.

Definition. Structural inequality is a condition where one category of people are attributed an unequal status in relation to other categories of people, and this relationship is perpetuated and reinforced by a confluence of unequal relations in roles, functions, decisions, rights, and opportunities. Therefore, if a class = 1 is subject to a structural inequality, that would mean that is confounded, and even if is represented and statistical parity holds (equation 1 is equal to zero), the measure of bias represented by this formulation may not be meaningful to the full extent.

2.1. Impact of Structural Inequality on Algorithmic Fairness Analysis

A structural challenge could occur in many real-world situations. Consider an example from the healthcare domain. Say getting effective treatment for a particular condition is the positive outcome

. Even if the probability that a patient gets treated for a particular condition is equal across all groups with attribute

, and there is data for all groups in the considered dataset, there could still exist an unobserved confounder, , that impacts the outcomes for groups of patients. For instance, in this case the confounder could be access to treatment due to lower levels of healthcare provider trust in particular groups of patients’ use of pain medication (Burgess, 2011).

Figure 1. In a simple case, a “confounding” effect can be represented by , which affects both and , e.g. access to healthcare which may affect groups (independent of attributes and also the outcome label for treatment. Illustrating structural inequities, including their variables and pathways need to be further identified to be accounted for in such models.
Dataset Description Sensitive Attribute
race gender age other
Adult: U.S. Census income data (Dua and Graff, 2017). (Kearns et al., 2019; Friedler et al., 2019; Noriega-Campero et al., 2018; Oneto et al., 2019) (Kearns et al., 2019; Celis et al., 2019; Albarghouthi and Vinitsky, 2019; Friedler et al., 2019; Oneto et al., 2019; Ali et al., 2019; McNamara et al., 2019; Cardoso et al., 2019) (Kearns et al., 2019) (Kearns et al., 2019; Albarghouthi and Vinitsky, 2019)
Common Crawl: Occupation biographies (Foundation, [n.d.]). (De-Arteaga et al., 2019)
Comm. & Crime: U.S. Census and crime data (Dua and Graff, 2017). (Kearns et al., 2019; Heidari et al., 2018)
COMPAS: recidivism risk assessment data (Angwin et al., 2016). (Canetti et al., 2019; Celis et al., 2019; Friedler et al., 2019; Cardoso et al., 2019; Goel and Faltings, 2019; McNamara et al., 2019) (Friedler et al., 2019)
Dutch census income data (Calders and Verwer, 2010). (Cardoso et al., 2019)
FICO: redit scores from TransUnion (Hardt et al., 2016). (Milli et al., 2019)
German Credit: creditworthiness dataset (Dua and Graff, 2017). (Celis et al., 2019) (Friedler et al., 2019; Noriega-Campero et al., 2018)
Health Risk: proprietary scores organization(Obermeyer and Mullainathan, 2019). (Obermeyer and Mullainathan, 2019)
Heart Health Prediction (Dua and Graff, 2017). (Noriega-Campero et al., 2018)
HMDA: Home Mortgage Disclosure Act data (Bureau, 2014). (Chen et al., 2019)
IHDP: Infant Health and Development Program (Brooks-Gunn et al., 1994). (Madras et al., 2019)
Justice: court processing data for felony defendants (of Justice. Office of Justice Programs. Bureau of Justice Statistics, 1998). (Green and Chen, 2019)
LSAC: National Longitudinal Bar Passage Study (Wightman, 1998). (Kearns et al., 2019) (Kearns et al., 2019) (Kearns et al., 2019)
LSAT: Law school admission test scores and grades (Wachter et al., 2017). (Russell, 2019)
MEPS: Medical Expenditure Panel Survey (Agency for Healthcare Research and Quality, 2018) (Coston et al., 2019) (Coston et al., 2019)
Mexican household survey (Ibarrarán et al., 2017). (Noriega-Campero et al., 2018)
Mobile Money Loan Approval in East Africa (Speakman et al., 2018) (Coston et al., 2019)
PPB: Pilot Parliaments Benchmark (Buolamwini and Gebru, 2018). (Amini et al., 2019; Kim et al., 2019; Raji and Buolamwini, 2019) (Amini et al., 2019; Kim et al., 2019; Raji and Buolamwini, 2019)
Ricci v Stefano U.S. Supreme Court case (of the United States, 2009). (Friedler et al., 2019)
Stanford Medicine Research Data Repository (Lowe et al., 2009). (Pfohl et al., 2019) (Pfohl et al., 2019) (Pfohl et al., 2019)
Student achievement in secondary education (Dua and Graff, 2017). (Kearns et al., 2019) (Kearns et al., 2019) (Kearns et al., 2019)
THEOP: Texas Higher Education (Tienda and Sullivan, 2011). (Borgs et al., 2019)
Table 1. Datasets and sensitive data attributes targeted for evaluating fairness mitigation models in FAT and AIES 2019 papers.

Figure 1 uses a causal graph to illustrate such a scenario. The use of causal frameworks to understand the unfair impact of such confounders on potentially biased prediction has been proposed (Kusner et al., 2017), along with methods to uncover the interactions between unobserved variables and outcomes. For example, Kusner et al. (Kusner et al., 2017) use a single confounder in an accident prediction model, and Kannan et al. (Kannan et al., 2019) evaluate a hiring problem. However we know that for such complex real world interactions, there are many confounders likely to have an impact on outcomes.

What makes structural inequities more challenging than a typical confounder is found in the definition above, wherein a structural inequality is “perpetuated and reinforced by a confluence of unequal relations in roles, functions, decisions, rights, and opportunities” indicating that the confounder could still affect in an unknown/addressable way at any given point in time. In other words, by nature, the structural inequality is of an encompassing magnitude and difficult to quantify. Identifying a single variable based on this situation is not straightforward and might not fit into the simple (yet robust) paradigms often considered.

Structural inequality may influence interactions throughout the causal graph. For instance in the hiring problem considered in (Kannan et al., 2019) it is assumed that an employer at the end of a hiring pipeline is rational – that it computes and makes a decision based on a posterior distribution and all necessary data is available for this. However we know that such decisions come down to the judgments of human analysts, whose decision making is impacted by structural inequalities through their own implicit bias. Several studies have shown biases in hiring practices continue over time and have not shown any sign of decrease, despite the availability of information and policies which promote equal opportunity (Quillian et al., 2017).

To continue our healthcare example, structural inequities may manifest through many different mechanisms. Studies have demonstrated the impact of social deprivation on health outcomes and have suggested multiple pathways that may contribute to adverse outcomes (Cattell, 2013). For example, patient-related health beliefs and behavior, as well as access to care through delayed presentation or access to medical services (Prentice and Pizer, 2013). Moreover, the decision to order tests can be affected by human judgments (if doctors are biased against , they may be less likely to be treated (which can be referred to as the selective labels problem (Lakkaraju et al., 2017)). However the decision to treat conditional on test results has not been shown or suggested after testing (Mullainathan and Obermeyer, 2019). Each of these occurrences can generate adverse outcomes at the individual level and also lead to structural inequities over time if there is sufficient penetrance of the described behaviors.

In sum, these examples serve to highlight how the specific issues at hand, in our example here healthcare outcome disparities, are complex and may require further domain insight or awareness in order to fully develop a given problem statement and solution formulation. Therefore in addition to the valuable approaches to fairness mitigation proposed in the recent literature, we feel it is important to consider such problems in the context of the conditions created by structural inequality.

2.2. Impact of Structural Inequality on the Computing Community

To gain insight into the current paradigm of research into fairness and bias mitigation strategies, we consider the recent literature which focuses on the treatment of historically disadvantaged groups. Such study typically defines groups by sensitive data attributes protected by U.S. law in high impact domains (Barocas and Selbst, 2016), including the Fair Housing Act (FHA) (data.gov US Housing and Department, 2019) and Equal Credit Opportunity Act (ECOA) (data.gov, [n.d.]). The data attributes protected under these laws include age, disability, gender identity, marital status, national origin, race, recipient of public assistance, religion, and sex. To demonstrate the problem settings covered, we present an (incomplete) survey of recent papers from exemplary leading conferences on fairness and AI Ethics for anecdotal consideration. Table 1 summarizes papers from the 2019 ACM Fairness, Accountability, and Transparency (FAT) Conference and 2019 AAAI/ACM conference on Artificial Intelligence, Ethics and Society (AIES). 111The reader is referred to https://fatconference.org/network/ for a comprehensive listing of similar venues. Papers included are those which experimentally evaluate bias-mitigation algorithms or fairness metrics. Examining the datasets used and the protected groups targeted, we can see that the majority of analysis focuses on a narrow set of attributes, with race and gender the most prevalent sensitive attributes targeted (in 44 out of 57 experiments).

We present this overview of focused attention by the fairness and ethics in AI research community on the impact of structural inequality on women and racial or ethnic minorities in the United States to contrast with the inclusion of these groups in the field of computing. Unfortunately, we see stark disparity in participation of these groups in tech jobs and computing education. This disparity can be seen across computer science educational programs, research institutions, and technical jobs in industry. For instance, we refer to the Taulbee survey (Zweben and Bizot, 2019), which has been conducted by the Computing Resource Association (CRA) annually since 1974. In the latest 2018 survey, across 169 PhD granting programs in the U.S. and Canada we see huge gender imbalances in computer science (77.7% male) and computer engineering (80.7% male) (Figure 1(b)). There is also a troubling distribution across racial or ethnic groups, with white students making up 22.9% of enrolled students and black and latinx students making up only 2.0% and 1.7% respectively (Figure 1(b)). The survey also reveals the additional insight that 62.6% of students attending these programs are from home countries different from where their institutions are located. This shows how stark the differences in engagement with computing are particularly within the U.S. population. Similar disparity is present in industry as illustrated in Figure 2. We see the same gender imbalance exists worldwide in technical roles across top technology companies, and within the U.S. the same trend in racial disparities.

(a) Gender parity values for technical employees.
(b) Breakdown of racial and ethnic groups. *Native American includes Native Americans, Alaska Natives, Native Hawaiian and Pacific Islanders
Figure 2. Demographic breakdown for technical employees at top technology companies, PhD granting institutions, and computing fields in the U.S. Values for the companies are sourced from their most recent annual reports (Microsoft, 2018; Brown, Danielle Brown and Parker, Melonie, 2019; Facebook, 2019). Educational values are from the Taulbee survey, (Zweben and Bizot, 2019), and U.S. occupational data is from the Bureau of labor statistics analysis of Computer and mathematical occupations from the January 2018 Current Population Survey (Bureau of Labor Statistics, U.S. Department of Labor, 2018).

We explicitly demonstrate this troubling disconnect between the subjects of the research in fairness and ethical AI, and the body of researchers and practitioners here because the lack of needed domain insight and diverse perspectives has dire implications for the ability of the field to build on this crucial research and to responsibly implement the proposed methods in production systems. Failing to improve diversity in the computing field while advancing bias mitigation technologies is setting up for failure, leaving researchers and practitioners under-resourced to preempt sources of unfair bias in the technologies they design and build.

For example, a recent study surveying machine learning practitioners was conducted to understand how tools enabled by machine learning can have a more positive impact on industry practice (Holstein et al., 2019). Several gaps were identified from themes of this discussion. Specifically, for the design of algorithmic systems, the crucial need to address biases in the humans embedded throughout the machine learning development pipeline was highlighted. Additionally, survey respondents noted their susceptibility to blind spots, in part due to the lack of diverse perspectives within their own teams as compared to the real-world users who interacted with their products once they were deployed. Improving the diversity of the workers and practitioners involved in this process could aid in ameliorating these issues, as it would directly provide an understanding of real world needs for appropriate product development.

It has been pointed out extensively how the lack of diversity leads to poor outcomes in many fields of endeavor. Examples range from evidence in hospitals that less diverse doctors and nurses leads to worse patient care (Cohen et al., 2002; Alsan et al., 2018), to management in firms (Dezsö and Ross, 2012), scientific discovery (Nielsen et al., 2018) and economic profit (Noland et al., 2016). The marked gender and racial disparity we see in computing no doubt similarly impacts innovation and value in the development of new technologies.

3. Fairness in the Literature and Possible Confounding

To further elaborate on this premise that diverse representation could be a proactive approach to mitigating data and algorithm biases, we next identify specific ways in which greater diversity among the designers and creators of algorithmic systems would have been integral in avoiding the cited scenarios studied in a number of recent ethics in AI papers.

3.1. Biases in Data

A highly cited example of data bias is the Gender Shades study by Buolamwini and Gebru which highlighted disparate performance in commercial facial recognition systems

(Buolamwini and Gebru, 2018). This well-discussed scenario highlights how designers of an image processing algorithm may not think of all its implications on different populations, namely a skin color that is not their own, or representative of the majority of the people around them. Due to a lack of representation of both female faces and dark skinned faces in the training datasets used, prediction rates by these commercial systems suffered greatly for these groups.

Another example of disparity due to training data is in natural language processing, where debiasing word embeddings has been a priority area of work

(Bolukbasi et al., 2016; Garg et al., 2018). Historical stereotypes are reflected in corpora of text used to train these embedding models, which are then used widely as a pre-processing step for automated text analysis. There may be both passive and active ways of putting together image or text datasets for algorithm development, and in both these cases, a proactive approach to sourcing such datasets could avoid wasting time and resources as well as potentially inflicting unfair or harmful outcomes on underrepresented groups. Realizing that all sets of texts or images may not be free of bias and being in an anticipatory mode could help to address and resolve such issues.

3.2. Algorithmic Bias

Another recent paper by Obermeyer et al. identifies a ‘problem formulation error’, or in other words a mis-specified objective function as a source of unfair bias in an automated system. In this study, they examine a commercial algorithm that is deployed nationwide today and affecting millions of people (Obermeyer and Mullainathan, 2019). They show that at the same health risk score, black patients are considerably sicker than whites due to the way the risk score is attributed to different illnesses that occur disparately. Instead of optimizing over health-related variables, a proxy label (in this case, cost) was used. Though not discussed by the authors, a diverse team may have identified this issue during the design of the risk score, prior to it being deployed and affecting the lives of millions of people.

In some ways this work connects to the broad idea of problem formulation, which has been discussed (Passi and Barocas, 2019). This study, combining ethnographic fieldwork and ideas from sociology and history of science, as well as critical data studies, sought to describe the complex set of actors and activities involved in problem formulation. Broad conclusions demonstrated that problem specification and operationalization are always dynamic processes and normative considerations are rarely included. Their work thus also highlights the need for a broad range of perspectives and considerations at problem and algorithm formation time.

3.3. Missing Labels

Another major issue in the AI ethics literature that is being addressed via algorithmic solutions is that of missing data; specifically when membership labels for a protected class are unavailable (Chen et al., 2019). Indeed, much work on algorithmic fairness must assume that protected attributes are known (Kusner et al., 2017; Lakkaraju et al., 2017). This is a reasonable assumption for work that builds on existing data, however the challenge of when a label needed to identify and ensure a class is represented/accounted for, reinforces the need for proactive recording of labels, which often are missing. This challenge may also directly benefit from diverse groups of people involved in dataset creation and analysis, who may be able to identify such attributes or recognize when they are not represented.

4. Recommendations for Increasing Diversity within Computing and the Ethics in AI Community

The recent literature has emphasized that increasing diversity is not simply a “pipeline” problem (van den Hurk, 2019). As such, we next discuss three areas in which we see potential to enhance diversity and inclusion in computing research and education by engaging the ethics in AI community: (1) connecting to a broader network of higher education institutions, (2) including stakeholders from diverse communities in the research process, and (3) creating opportunities within our own activities to support a diverse group of future leaders. We look to examples of successes from other disciplines where structural inequality has impacted the diversity of practitioners and therefore outcomes in those the fields. As well, we believe that these recommendations will also address challenges in creating sustainable diversity in computing and beyond, through impacting challenges including: societal norms (Uhlmann and Cohen, 2005; Bowles et al., 2005), limited access (Ibarra, 1995), heterogeneous sourcing (Hunt et al., 2015), tokenism (Torchia et al., 2011) and unfair treatment (Brooks et al., 2014; Moss-Racusin et al., 2012), which have all been described in diversity and representation gaps. While increasing diversity at the table doesn’t automatically fix equity, it is thought to improve it at the individual and community level in part via exposure to different values and backgrounds (Richards et al., 2007).

In recommending these tangible steps we hope to build on the excellent work of the community to date. We feel the social relevance, interdisciplinary structure, and crucial importance of the emerging field of AI Ethics make it rich with potential for broadening participation in computing by appealing to students’ interests and values. Making a positive social impact has been demonstrated to motivate non-traditional students in computer science education (Buckley et al., 2008; Goldweber et al., 2013). In addition, many research venues have already been evolving with diversity and inclusion as part of their core values, and this trend is encouraging. We note the success of affinity groups Black in AI, Women in Machine Learning, LatinX in AI, and Queer in AI at the NeurIPS, a top venue for Machine Learning, and the addition of the “Critiquing and Rethinking Accountability, Fairness and Transparency” CRAFT call in the 2020 FAT conference as evidence of these communities’ openness to innovative solutions to the lack of diversity in traditional computing conferences. Other considerations such as keeping cost of attending low, providing scholarships for students, and making conference material available online through open access and livestreams also serves to keep communities open to a wide audience and promote engagement. We applaud these efforts.

4.1. Building Collaborations with Minority Serving Institutions

The United States continues to see significant barriers to the full inclusion of underrepresented groups in technology disciplines, as detailed in Section 2.2. Affirmative action and other efforts to address pipeline problems have their limitations, as evidenced by analysis coming out of the fairness community itself (Sampath Kannan and Ziani, 2019). One way to bolster the number of underrepresented perspectives in computing is for FAT to partner with minority serving institutions in the study of socio-technical algorithmic systems. Minority serving institutions in the U.S. include 108 Historically Black colleges and universities (HBCUs), 274 Hispanic Serving Institutions (HSIs), 35 Tribal Colleges and Universities (TCUs) and underrepresented Asian American and Pacific Islander Serving Institutions (AAPASIs). While HBCUs comprise only 3% of America’s institutions of higher education, they produce 24% of all bachelors’ degrees earned by African-Americans (of the Interior, 2019). Within STEM disciplines they are responsible for graduating 40% of all African American STEM graduates (Owens Emiel and Kenyatta, 2012). Similarly, 40% of Hispanic-American students are awarded their bachelors’ degrees from HSIs (of the Interior, 2019).

When we broadly examine gender parity in science, technology, engineering and mathematics, it becomes clear that not every discipline has met with the same levels of gender success. When comparing gender parity in the workforce between medical schools, law schools and various tech industry staples, law school admission is one area where women have achieved parity in educational advancement since 2015 (Pisarcik, 2019). Interestingly, it has been minority serving/Historically Black Colleges and Universities (HBCUs) and non-traditional learner institutions that have been at the vanguard of this trend, due to their high enrollment numbers for female students. For example, in 2018 North Carolina Central University had a law school enrollment that was 66.85% female; Atlanta John Marshall Law School was 66.21 % female; Northeastern University was 65.76% female; and Howard University 65.70 % female enrollment. Female enrollment numbers at these institutions can be compared to the top five U.S. News and World Reports ranked law schools in the country whose female enrollment does not exceed 49.6% (Enjuris, 2019).

These statistics point to a potential solution to similar issues in computing education. While the tech industry and research institutions often focus on recruitment from predominantly a small number of elite institutions, the lessons garnered from law school admissions suggests that partnering with non-traditional and minority serving institutions may be the way forward in addressing the lack of diversity in educational programs both in gender diversity and ethnic diversity. These minority serving institutions (MSIs) are the location of nearly half of the underrepresented trainees in computing and represent a potentially untapped resource of diverse perspectives. Furthermore, of particular relevance to the study of fairness and ethics in AI is the fact that these institutions have a robust intellectual tradition of contextualizing the lives of marginalized populations.

Research partnerships can occur between individual researchers or as the result of organization to organization collaboration through a memorandum of understanding outlining specific exchanges and projects. We believe these efforts would represent a structural increase in the participation of underrepresented groups, bring a diversity of perspectives to bear on the design of algorithmic systems and thus directly address a goal of the ethics in AI community. Developing meaningful ongoing collaborations that can contextualize the implicit and sometimes explicit biases inherent in data is essential for this task.

4.2. Prioritizing Research Collaboration Between the Ethics in AI Community and Underrepresented/Interdisciplinary Groups

In addition to partnering with Minority Serving Educational Institutions, we see emphasizing collaborations with underrepresented and vulnerable groups themselves as a critical piece to broadening the scope of knowledge that the ethics in AI community has to draw upon. Recent scholarship in design-based research considers race and power dynamics between researchers and researched communities (Vakil et al., 2016; Vakil and Ayers, 2019), and can provide guidance on designing interventions which have meaningful impacts. For instance, it has been observed that traditional eurocentric epistemologies in research communities are often disconnected from the cultural practices and ways of knowing of underrepresented and vulnerable communities (Bang and Medin, 2010). Only working closely with these communities can we begin to incorporate this knowledge into our problem designs. Having a diversity of perspectives can support the development of culturally responsive computing technologies and educational pedagogy which take a proactive inclusive approach considering intersectionality, innovations, and technosocial activism, rather than one that requires accommodations after the fact for communities left out at the development phase (Burgstahler, 2011; Scott et al., 2015). Engagement can happen at all stages of the data analysis pipeline, and may be particularly important during the collection, analysis, and interpretation of data from their communities. For instance, recent work (Jackson et al., 2019) illustrates three robust case studies for collaborations that data scientists can have with underrepresented communities, including biomedical applications for improving the health of under served populations. These projects came out of proactive collaborations with members of these communities and suggest that underrepresented and indigenous communities are not only interested in being the subjects of research, or the passive recipients of derived knowledge about their own communities.

The interdisciplinary nature of the ethics in AI community inherently facilitates collaboration between experts in different fields. In particular, collaboration with social scientists who have been assessing mechanisms for structural inequities has much to offer the field. Though the work by social scientists may not be performed in a quantitative manner, this provides an opportunity for collaboration with quantitative communities interested in fairness. In general, as identifying and quantifying sources of Structural inequality is a complex task, it is potentially an area of interdisciplinary synergy, between quantitative scientists and the rich literature in the social sciences that already exists. This topic has been explored from multiple angles, including from the fields of education, political science, sociology, health and urban studies.

For example, a study on criminal justice algorithms might be greatly enriched by including social scientists from the communities most adversely affected by biases in the analysis of judicial data. This recommendation also can help to complete the communion loop of data findings back to those under served communities that are all to often left disconnected from the analytic fruits of their data. Creating collaborations between researchers interested in AI ethics, underrepresented data scientists, and complimentary domain expert thought leaders in these communities can lead to more robust insights into how to prevent algorithmic inequalities. Ways of accomplishing such collaboration include the example of community advisory boards that many biomedical research institutions employ to formalize academic–community partnerships (Newman et al., 2011), as well as research-based open houses that invite the community to learn about current research projects and provide opportunities to participate in research (Lachney, 2017).

A potential outcome of these strengthened ties of research collaboration could be to increase the number of interested trainees who enter the field. This could also serve as a research idea generator where researchers use community input to define those research areas that represent their most immediate needs. Through a process of collaboration, community activists can amplify the algorithmic messages identified through careful ethics in AI research.

4.3. Providing Enhanced Mentorship to Trainees at Ethics in AI Research Conferences

A barrier to full and diverse participation in the computing research community can revolve around the lack of onsite mentorship from senior researchers in communal spaces. Many students from diverse communities have shallower professional networks than other students and this can inhibit their introduction and advancement in research. This has a potentially pervasive effect on recruitment and retention efforts, in opposition to indications that black and latinx students show higher interest in learning computer science (Inc. and Inc., [n.d.]). Networking not only serves as an information dissemination tool, but also as a critical skill for trainees to develop for later career advancement. This can be especially challenging for trainees from underrepresented groups during networking-intense events like conferences. One ongoing concern is that underrepresented students are often less likely to have robust senior mentoring networks and strong ties to industry or research partnerships. This means that the ability to get recommendations to move forward, the prestige of those recommendations, and the ability to ask for introductions are truncated.

One way to increase and retain diverse participants within the computing research community is to use volunteer research mentors at conferences who can serve as a bridge between those underrepresented trainees and early career scientists who may not be well integrated into the community and more senior participants. Pairing these individuals with those more senior researchers can rapidly expand the networks of newcomers, thereby increasing the likelihood that they will be able to make long-term contributions to the field.

One example of a robust mentoring network exists through the Society for Molecular Biology and Evolution (SMBE). SMBE has developed a mentoring program to pair trainees and early career scientists with established researchers. This is particularly effective because their society has an ethos of inclusiveness and contributing to the next generation of science and technologists. While their mentorship program is not limited to individuals from underrepresented groups, they make efforts to match trainees to mentors based on their merit, area of research interest, languages spoken, and geographical locations (for Molecular Biology and Evolution, [n.d.]). Trainees communicate with their conference mentors prior to the start of the meetings, giving both trainees and mentors a chance to learn about each other’s research interests and career goals. During the conference, trainees and SMBE mentors are invited to have dinner together, meet up during the breaks and check in on research talks. This interaction creates an interpreted meeting that immediately connects neophytes with institutional knowledge. This link between well-networked researchers and those in need of connections can also facilitate retention of trainees and promote enthusiasm for the discipline amongst a broaden cohort of participants.

An important feature of the SMBE mentoring program is the inclusion of travel support for trainees whose characteristics will broaden the capacity of the organization to reach diversity goals (for Molecular Biology and Evolution, [n.d.]). Travel support for vulnerable and underrepresented trainees and early career scientists is a necessary investment in diversity and inclusion for computing communities. While merit only based awards are an excellent way to incentivize groundbreaking research, we observe that these programs often play into the pre-existing resource imbalances that favor trainees at elite institutions where resources are less constrained than they are at most MSIs. This means that institutions that train the large portions of underrepresented graduates have the least likely path to participation in conferences and workshops, while having the greatest financial barriers to entry into these intellectually rich spaces.

In Poverty and Power, Royce asserts that social networks are particularly important for underrepresented or marginalized populations because the ties binding people together and connecting individuals to organizations can:

”…channel information, convey cultural messages, create social solidarities, forge expectations and obligations, facilitate the enforcement of social norms, engender relations of mutual trust, serve as sources of social support, and operate as conduits of power and influence. In the performance of these functions, furthermore, social networks shape the distribution of resources and opportunities, advantaging some and disadvantaging others.” (Royce, 2019)

This recitation of the comprehensive utility of social networks supports our assessment that mentoring, a modality to build robust social networks, can serve as a necessary tool to build ethics in AI community diversity amongst trainees and early career scientist from underrepresented populations.

The use of affinity and mentoring workshop programs, such as those mentioned previously and Broadening Participation Workshops associated with conferences in a variety of computing sub-disciplines, have the secondary effect of building a new cadre of research leaders who can continue to invest their intellectual and service efforts towards the betterment of the field. Investment in trainees and early career researchers is an essential part of organizational capacity building. The ethics in AI community is a relatively new research community that could see substantial benefits from the increased inclusion of trainees and early career scientists from underrepresented disciplines.

One such workshop program is the Broadening Participation in Data Mining (BPDM) Workshop. This workshop traditionally brings together underrepresented trainees and early career scientists to increase their exposure to academic, industry and federal careers in data science. In 2019, the 7th annual BPDM workshop brought together 55 trainees and 10 mentors to Howard University, a Historically Black College/University. In the past, BPDM was associated with national or international conferences such as ACM SIGKDD or SIAM CSE. This 3 day workshop allowed participants to network with peers and senior mentors within a community of other individuals from underrepresented communities. As part of a computing tutorial exercise during the workshop, participants spent time using the COMPAS dataset to identify key themes and possible solutions to structural inequality (Jackson et al., 2019). This tutorial session was important not only because it addressed fairness and ethics principles within a computer science context, but it also served to encourage participants to use computing to address issue that were of considerable concern to the social justice needs of their communities. Finally the workshop provided opportunities to create intellectual community for those participants whose experience in undergraduate, graduate and post-doctoral education is isolating by virtue of their identity.

5. Conclusion

Education efforts that grow representation in meaningful ways may obviate the need for much algorithmic manipulation, and also help to maintain retention efforts because no one member of an underrepresented community has to shoulder the burden of speaking for their group. This work recognizes opportunities for the ethics in AI community to increase the breadth of perspectives in computing in order to further develop our pursuit of algorithmic fairness. We have proposed three major areas within which the community can address structural inequalities: building educational collaborations with minority serving institutions, building capacity through research collaborations with community domain experts, and using educational mentoring to develop a cadre of diverse future leaders in the computing. This can be accomplished by meeting underrepresented and vulnerable communities ‘where they are.’ This idea refers to both identifying the educational institutions that produce the largest number of underrepresented trainees, and using mentoring approaches to increase opportunities for trainees to actively participate in communal spaces such as computing conferences.

We support an organic bottom up approach that both assists researchers in improving the fairness of algorithmic systems while empowering under served gender and ethnic communities to realize equity. This capacity building approach within research communities can also help to reinforce more equitable resources for research endeavours for collaborating partners, and increased communal activism to ameliorate structural inequalities reflected in data that the ethics in AI community works to account for. Without attempting to enact these educational initiatives, the ethics in AI community may miss out on a unique opportunity to build diverse perspective capacity, and to sustainably improve fairness, accountability and transparency in socio-technical systems through addressing structural inequalities.

References

  • (1)
  • Agency for Healthcare Research and Quality (2018) Agency for Healthcare Research and Quality. 2018. Medical Expenditure Panel Survey. https://www.ahrq.gov/data/meps.html.
  • Albarghouthi and Vinitsky (2019) Aws Albarghouthi and Samuel Vinitsky. 2019. Fairness-aware programming. In Proceedings of the Conference on Fairness, Accountability, and Transparency. ACM, 211–219.
  • Ali et al. (2019) Junaid Ali, Muhammad Bilal Zafar, Adish Singla, and Krishna P Gummadi. 2019. Loss-Aversively Fair Classification. In Proceedings of the 2019 AAAI/ACM Conference on AI, Ethics, and Society. 211–218.
  • Alsan et al. (2018) Marcella Alsan, Owen Garrick, and Grant C Graziani. 2018. Does diversity matter for health? Experimental evidence from Oakland. Technical Report. National Bureau of Economic Research.
  • Amini et al. (2019) Alexander Amini, Ava Soleimany, Wilko Schwarting, Sangeeta Bhatia, and Daniela Rus. 2019. Uncovering and Mitigating Algorithmic Bias through Learned Latent Structure. (2019).
  • Angwin et al. (2016) Julia Angwin, Jeff Larson, Surya Mattu, and Lauren Kirchner. 2016. Machine bias. Pro Publica (2016).
  • Bang and Medin (2010) Megan Bang and Douglas Medin. 2010. Cultural processes in science education: Supporting the navigation of multiple epistemologies. Science Education 94, 6 (2010), 1008–1026.
  • Barocas et al. (2019) Solon Barocas, Moritz Hardt, and Arvind Narayanan. 2019. Fairness and Machine Learning. fairmlbook.org. http://www.fairmlbook.org.
  • Barocas and Selbst (2016) Solon Barocas and Andrew D Selbst. 2016. Big data’s disparate impact. Cal. L. Rev. 104 (2016), 671.
  • Bolukbasi et al. (2016) Tolga Bolukbasi, Kai-Wei Chang, James Y Zou, Venkatesh Saligrama, and Adam T Kalai. 2016. Man is to computer programmer as woman is to homemaker? debiasing word embeddings. In Advances in neural information processing systems. 4349–4357.
  • Borgs et al. (2019) Christian Borgs, Jennifer Chayes, Nika Haghtalab, Adam Tauman Kalai, and Ellen Vitercik. 2019. Algorithmic greenlining: An approach to increase diversity. In Proceedings of the 2019 AAAI/ACM Conference on AI, Ethics, and Society. 69–76.
  • Bowles et al. (2005) Hannah Riley Bowles, Linda Babcock, and Lei Lai. 2005. It depends who is asking and who you ask: Social incentives for sex differences in the propensity to initiate negotiation. (2005).
  • Brooks et al. (2014) Alison Wood Brooks, Laura Huang, Sarah Wood Kearney, and Fiona E Murray. 2014. Investors prefer entrepreneurial ventures pitched by attractive men. Proceedings of the National Academy of Sciences 111, 12 (2014), 4427–4431.
  • Brooks-Gunn et al. (1994) Jeanne Brooks-Gunn, Fong-ruey Liaw, and Pamela Kato Klebanov. 1994. Effects of early intervention on cognitive function of low birth weight preterm infants. Pediatric Physical Therapy 6, 1 (1994), 40–41.
  • Brown, Danielle Brown and Parker, Melonie (2019) Brown, Danielle Brown and Parker, Melonie. 2019. Google diversity annual report. https://diversity.google/annual-report/. [Online; accessed 21-August-2019].
  • Buckley et al. (2008) Michael Buckley, John Nordlinger, and Devika Subramanian. 2008. Socially relevant computing. In ACM SIGCSE Bulletin, Vol. 40. ACM, 347–351.
  • Buolamwini and Gebru (2018) Joy Buolamwini and Timnit Gebru. 2018. Gender shades: Intersectional accuracy disparities in commercial gender classification. In Conference on fairness, accountability and transparency. 77–91.
  • Bureau (2014) Consumer Financial Protection Bureau. 2014. Using publicly available information to proxy for unidentified race and ethnicity: A methodology and assessment. Washington, DC: CFPB, Summer (2014).
  • Bureau of Labor Statistics, U.S. Department of Labor (2018) Bureau of Labor Statistics, U.S. Department of Labor. 2018. Current Population Survey, Employed persons by detailed occupation, sex, race, and Hispanic or Latino ethnicity. https://www.bls.gov/cps/cpsaat11.htm. [Online; accessed 21-August-2019].
  • Burgess (2011) Diana J Burgess. 2011. Addressing racial healthcare disparities: how can we shift the focus from patients to providers? Journal of general internal medicine 26, 8 (2011), 828–830.
  • Burgstahler (2011) Sheryl Burgstahler. 2011. Universal design: Implications for computing education. ACM Transactions on Computing Education (TOCE) 11, 3 (2011), 19.
  • Calders and Verwer (2010) Toon Calders and Sicco Verwer. 2010.

    Three naive Bayes approaches for discrimination-free classification.

    Data Mining and Knowledge Discovery 21, 2 (2010), 277–292.
  • Caliskan et al. (2017) Aylin Caliskan, Joanna J Bryson, and Arvind Narayanan. 2017. Semantics derived automatically from language corpora contain human-like biases. Science 356, 6334 (2017), 183–186.
  • Canetti et al. (2019) Ran Canetti, Aloni Cohen, Nishanth Dikkala, Govind Ramnarayan, Sarah Scheffler, and Adam Smith. 2019. From soft classifiers to hard decisions: How fair can we be?. In Proceedings of the Conference on Fairness, Accountability, and Transparency. ACM, 309–318.
  • Cardoso et al. (2019) Rodrigo L Cardoso, Wagner Meira Jr, Virgilio Almeida, and Mohammed J Zaki. 2019. A Framework for Benchmarking Discrimination-Aware Models in Machine Learning. (2019).
  • Cattell (2013) Vicky Cattell. 2013. Poor people, poor places, and poor health: the mediating role of social networks and social capital. Social Science & Medicine 52 (May 2001 2013), 1501–1516. Issue 10. https://doi.org/10.1016/S0277-9536(00)00259-8
  • Celis et al. (2019) L Elisa Celis, Lingxiao Huang, Vijay Keswani, and Nisheeth K Vishnoi. 2019. Classification with fairness constraints: A meta-algorithm with provable guarantees. In Proceedings of the Conference on Fairness, Accountability, and Transparency. ACM, 319–328.
  • Chen et al. (2018) Irene Chen, Fredrik D Johansson, and David Sontag. 2018. Why is my classifier discriminatory?. In Advances in Neural Information Processing Systems. 3539–3550.
  • Chen et al. (2019) Jiahao Chen, Nathan Kallus, Xiaojie Mao, Geoffry Svacha, and Madeleine Udell. 2019. Fairness under unawareness: Assessing disparity when protected class is unobserved. In Proceedings of the Conference on Fairness, Accountability, and Transparency. ACM, 339–348.
  • Cohen et al. (2002) Jordan J Cohen, Barbara A Gabriel, and Charles Terrell. 2002. The case for diversity in the health care workforce. Health affairs 21, 5 (2002), 90–102.
  • Coston et al. (2019) Amanda Coston, Karthikeyan Natesan Ramamurthy, Dennis Wei, Kush R Varshney, Skyler Speakman, Zairah Mustahsan, and Supriyo Chakraborty. 2019.

    Fair transfer learning with missing protected attributes. In

    Proceedings of the AAAI/ACM Conference on Artificial Intelligence, Ethics, and Society, Honolulu, HI, USA.
  • data.gov ([n.d.]) data.gov. [n.d.]. Consumer Credit Trends. https://www.consumerfinance.gov/data-research/consumer-credit-trends/credit-cards/.
  • data.gov US Housing and Department (2019) data.gov US Housing and Urban Development Department. February 22, 2019. Fair Housing Act Cases Filed by Year and State. https://catalog.data.gov/dataset/fair-housing-act-cases-filed-by-year-and-state.
  • De-Arteaga et al. (2019) Maria De-Arteaga, Alexey Romanov, Hanna Wallach, Jennifer Chayes, Christian Borgs, Alexandra Chouldechova, Sahin Geyik, Krishnaram Kenthapadi, and Adam Tauman Kalai. 2019. Bias in bios: A case study of semantic representation bias in a high-stakes setting. In Proceedings of the Conference on Fairness, Accountability, and Transparency. ACM, 120–128.
  • Dezsö and Ross (2012) Cristian L Dezsö and David Gaddis Ross. 2012. Does female representation in top management improve firm performance? A panel data investigation. Strategic Management Journal 33, 9 (2012), 1072–1089.
  • Dua and Graff (2017) Dheeru Dua and Casey Graff. 2017. UCI Machine Learning Repository. http://archive.ics.uci.edu/ml.
  • Dwork et al. (2012) Cynthia Dwork, Moritz Hardt, Toniann Pitassi, Omer Reingold, and Richard Zemel. 2012. Fairness through awareness. In Proceedings of the 3rd innovations in theoretical computer science conference. ACM, 214–226.
  • Enjuris (2019) Enjuris. MARCH 5, 2019. Law School Rankings by Female Enrollment (2018). https://www.enjuris.com/students/law-school-female-enrollment-2018.html.
  • Facebook (2019) Facebook. 2019. Diversity Report. https://diversity.fb.com/read-report/. [Online; accessed 21-August-2019].
  • Feldman et al. (2015) Michael Feldman, Sorelle A Friedler, John Moeller, Carlos Scheidegger, and Suresh Venkatasubramanian. 2015. Certifying and removing disparate impact. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 259–268.
  • for Molecular Biology and Evolution ([n.d.]) Society for Molecular Biology and Evolution. [n.d.]. Undergraduate Travel and Mentoring Award. Retrieved August 22, 2019 from https://www.smbe.org/smbe/AWARDS/AnnualMeetingTravelAwards/UndergraduateTravelandMentoringAward.aspx
  • Foundation ([n.d.]) Common Crawl Foundation. [n.d.]. Common crawl. URl: http://http://commoncrawl. org ([n. d.]).
  • Friedler et al. (2019) Sorelle A Friedler, Carlos Scheidegger, Suresh Venkatasubramanian, Sonam Choudhary, Evan P Hamilton, and Derek Roth. 2019. A comparative study of fairness-enhancing interventions in machine learning. In Proceedings of the Conference on Fairness, Accountability, and Transparency. ACM, 329–338.
  • Garg et al. (2018) Nikhil Garg, Londa Schiebinger, Dan Jurafsky, and James Zou. 2018. Word embeddings quantify 100 years of gender and ethnic stereotypes. Proceedings of the National Academy of Sciences 115, 16 (2018), E3635–E3644.
  • Goel and Faltings (2019) Naman Goel and Boi Faltings. 2019. Crowdsourcing with Fairness, Diversity and Budget Constraints. In Proceedings of the 2019 AAAI/ACM Conference on AI, Ethics, and Society. 297–304.
  • Goldweber et al. (2013) Michael Goldweber, John Barr, Tony Clear, Renzo Davoli, Samuel Mann, Elizabeth Patitsas, and Scott Portnoff. 2013. A framework for enhancing the social good in computing education: a values approach. ACM Inroads 4, 1 (2013), 58–79.
  • Green and Chen (2019) Ben Green and Yiling Chen. 2019. Disparate interactions: An algorithm-in-the-loop analysis of fairness in risk assessments. In Proceedings of the Conference on Fairness, Accountability, and Transparency. ACM, 90–99.
  • Hardt et al. (2016) Moritz Hardt, Eric Price, Nati Srebro, et al. 2016.

    Equality of opportunity in supervised learning. In

    Advances in Neural Information Processing Systems. 3315–3323.
  • Heidari et al. (2018) Hoda Heidari, Michele Loi, Krishna P Gummadi, and Andreas Krause. 2018. A moral framework for understanding of fair ml through economic models of equality of opportunity. arXiv preprint arXiv:1809.03400 (2018).
  • Holstein et al. (2019) Kenneth Holstein, Jennifer Wortman Vaughan, Hal Daumé III, Miro Dudik, and Hanna Wallach. 2019. Improving fairness in machine learning systems: What do industry practitioners need?. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems. ACM, 600.
  • Hunt et al. (2015) Vivian Hunt, Dennis Layton, and Sara Prince. 2015. Diversity matters. McKinsey & Company 1 (2015), 15–29.
  • Ibarra (1995) Herminia Ibarra. 1995. Race, opportunity, and diversity of social circles in managerial networks. Academy of management journal 38, 3 (1995), 673–703.
  • Ibarrarán et al. (2017) Pablo Ibarrarán, Nadin Medellín, Ferdinando Regalia, Marco Stampini, Sandro Parodi, Luis Tejerina, Pedro Cueva, and Madiery Vásquez. 2017. How conditional cash transfers work. (2017).
  • Inc. and Inc. ([n.d.]) Google Inc. and Gallup Inc. [n.d.]. Diversity Gaps in Computer Science: Exploring the Underrepresentation of Girls, Blacks and Hispanics (2016). Retrieved August 20, 2019 from http://goo.gl/PG34aH
  • Jackson et al. (2019) Latifa Jackson, Caitlin Kuhlman, Fatimah Jackson, and P Keolu Fox. 2019. Including Vulnerable Populations in the Assessment of Data From Vulnerable Populations. Frontiers in Big Data 2 (06 2019). https://doi.org/10.3389/fdata.2019.00019
  • Kannan et al. (2019) Sampath Kannan, Aaron Roth, and Juba Ziani. 2019. Downstream effects of affirmative action. In Proceedings of the Conference on Fairness, Accountability, and Transparency. ACM, 240–248.
  • Kearns et al. (2019) Michael Kearns, Seth Neel, Aaron Roth, and Zhiwei Steven Wu. 2019. An empirical study of rich subgroup fairness for machine learning. In Proceedings of the Conference on Fairness, Accountability, and Transparency. ACM, 100–109.
  • Kim et al. (2019) Michael P Kim, Amirata Ghorbani, and James Zou. 2019. Multiaccuracy: Black-box post-processing for fairness in classification. In Proceedings of the 2019 AAAI/ACM Conference on AI, Ethics, and Society. 247–254.
  • Kusner et al. (2017) Matt J Kusner, Joshua Loftus, Chris Russell, and Ricardo Silva. 2017. Counterfactual fairness. In Advances in Neural Information Processing Systems. 4066–4076.
  • Lachney (2017) Michael Lachney. 2017. Computational communities: African-American cultural capital in computer science education. Computer Science Education 27, 3-4 (2017), 175–196.
  • Lakkaraju et al. (2017) Himabindu Lakkaraju, Jon Kleinberg, Jure Leskovec, Jens Ludwig, and Sendhil Mullainathan. 2017. The selective labels problem: Evaluating algorithmic predictions in the presence of unobservables. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 275–284.
  • Lowe et al. (2009) Henry J Lowe, Todd A Ferris, Penni M Hernandez, and Susan C Weber. 2009. STRIDE–An integrated standards-based translational research informatics platform. In AMIA Annual Symposium Proceedings, Vol. 2009. American Medical Informatics Association, 391.
  • Madras et al. (2019) David Madras, Elliot Creager, Toniann Pitassi, and Richard Zemel. 2019. Fairness through causal awareness: Learning causal latent-variable models for biased data. In Proceedings of the Conference on Fairness, Accountability, and Transparency. ACM, 349–358.
  • McNamara et al. (2019) Daniel McNamara, Cheng Soon Ong, and Robert C Williamson. 2019. Costs and benefits of fair representation learning. In Proceedings of the 2019 AAAI/ACM Conference on AI, Ethics, and Society. 263–270.
  • Microsoft (2018) Microsoft. 2018. Workforce Demographic Report. https://www.microsoft.com/en-us/diversity/inside-microsoft/default.aspx. [Online; accessed 21-August-2019].
  • Milli et al. (2019) Smitha Milli, John Miller, Anca D Dragan, and Moritz Hardt. 2019. The Social Cost of Strategic Classification. In Proceedings of the Conference on Fairness, Accountability, and Transparency. ACM, 230–239.
  • Moss-Racusin et al. (2012) Corinne A Moss-Racusin, John F Dovidio, Victoria L Brescoll, Mark J Graham, and Jo Handelsman. 2012. Science faculty’s subtle gender biases favor male students. Proceedings of the National Academy of Sciences 109, 41 (2012), 16474–16479.
  • Mullainathan and Obermeyer (2019) Sendhil Mullainathan and Ziad Obermeyer. 2019. Who is Tested for Heart Attack and Who Should Be: Predicting Patient Risk and Physician Error. NBER Working Paper Series 26168 (2019).
  • Newman et al. (2011) Susan D Newman, Jeannette O Andrews, Gayenell S Magwood, Carolyn Jenkins, Melissa J Cox, and Deborah C Williamson. 2011. Peer reviewed: community advisory boards in community-based participatory research: a synthesis of best processes. Preventing chronic disease 8, 3 (2011).
  • Nielsen et al. (2018) Mathias Wullum Nielsen, Carter Walter Bloch, and Londa Schiebinger. 2018. Making gender diversity work for scientific discovery and innovation. Nature Human Behaviour (2018), 1.
  • Noland et al. (2016) Marcus Noland, Tyler Moran, and Barbara R Kotschwar. 2016. Is gender diversity profitable? Evidence from a global survey. Peterson Institute for International Economics Working Paper 16-3 (2016).
  • Noriega-Campero et al. (2018) Alejandro Noriega-Campero, Michiel Bakker, Bernardo Garcia-Bulle, and Alex Pentland. 2018. Active Fairness in Algorithmic Decision Making. arXiv preprint arXiv:1810.00031 (2018).
  • Obermeyer and Mullainathan (2019) Ziad Obermeyer and Sendhil Mullainathan. 2019. Dissecting Racial Bias in an Algorithm that Guides Health Decisions for 70 Million People. In Proceedings of the Conference on Fairness, Accountability, and Transparency. ACM, 89–89.
  • of Justice. Office of Justice Programs. Bureau of Justice Statistics (1998) United States Department of Justice. Office of Justice Programs. Bureau of Justice Statistics. 1998. State Court Processing Statistics, 1990-2006: Felony Defendants in Large Urban Counties. Inter-university Consortium for Political and Social Research.
  • of the Interior (2019) Department of the Interior. 2019. Minority Serving Institutions List. https://www.doi.gov/pmb/eeo/doi-minority-serving-institutions-program.
  • of the United States (2009) Supreme Court of the United States. 2009. Ricci v. DeStefano 557 U.S. 557, 174.
  • Oneto et al. (2019) Luca Oneto, Michele Doninini, Amon Elders, and Massimiliano Pontil. 2019. Taking advantage of multitask learning for fair classification. In Proceedings of the 2019 AAAI/ACM Conference on AI, Ethics, and Society. 227–237.
  • Owens Emiel and Kenyatta (2012) Bloom Collette Owens Emiel, Shelton Andrea and Cavil Kenyatta. 2012. The significance of HBCUs to the production of STEM Graduates: Answering the Call. Educational Foundations 26, 3-4 (Fall 2012), 33–47.
  • P. Wesley Schultz and Serpe (2013) Anna Woodcock Mica Estrada Randie C. Chance Maria Aguilar P. Wesley Schultz, Paul R. Hernandez and Richard T. Serpe. 2013. Patching the Pipeline: Reducing Educational Disparities in the Sciences Through Minority Training Programs. Educ Eval Policy Anal 33, 1 (2011 Mar 1 2013), 27. https://doi.org/10.3102/0162373710392371
  • Passi and Barocas (2019) Samir Passi and Solon Barocas. 2019. Problem formulation and fairness. In Proceedings of the Conference on Fairness, Accountability, and Transparency. ACM, 39–48.
  • Pfohl et al. (2019) Stephen Pfohl, Ben Marafino, Adrien Coulet, Fatima Rodriguez, Latha Palaniappan, and Nigam H Shah. 2019. Creating fair models of atherosclerotic cardiovascular disease risk. In Proceedings of the 2019 AAAI/ACM Conference on AI, Ethics, and Society. 271–278.
  • Pisarcik (2019) Ian Pisarcik. MARCH 5, 2019. Women Outnumber Men in Law School Classrooms for Third Year in a Row, but Statistics Don’t Tell the Full Story. https://www.jurist.org/commentary/2019/03/pisarcik-women-outnumber-men-in-law-school/.
  • Prentice and Pizer (2013) Julia C. Prentice and Steven D. Pizer. 2013. Delayed Access to Health Care and Mortality. Health Services Research 42 (April 2007 2013), 644–662. Issue 2. https://doi.org/10.1111/j.1475-6773.2006.00626.x
  • Quillian et al. (2017) Lincoln Quillian, Devah Pager, Ole Hexel, and Arnfinn H Midtbøen. 2017. Meta-analysis of field experiments shows no change in racial discrimination in hiring over time. Proceedings of the National Academy of Sciences 114, 41 (2017), 10870–10875.
  • Raji and Buolamwini (2019) Inioluwa Deborah Raji and Joy Buolamwini. 2019. Actionable auditing: Investigating the impact of publicly naming biased performance results of commercial ai products. In AAAI/ACM Conf. on AI Ethics and Society, Vol. 1.
  • Richards et al. (2007) Heraldo V Richards, Ayanna F Brown, and Timothy B Forde. 2007. Addressing diversity in schools: Culturally responsive pedagogy. Teaching Exceptional Children 39, 3 (2007), 64–68.
  • Royce (2019) Edward Royce. 2019. Poverty and Power: The Problem of Structural Inequality (3rd Ed.). Rowman and Littlefield, 4501 Forbes Boulevard, Suite 200, Lanham, Maryland 20706.
  • Russell (2019) Chris Russell. 2019. Efficient search for diverse coherent explanations. arXiv preprint arXiv:1901.04909 (2019).
  • Sampath Kannan and Ziani (2019) Aaron Roth Sampath Kannan and Juba Ziani. 2019. Downstream Effects of Affirmative Action. Educational Foundations (January 29-31 2019), 9. https://doi.org/http9//doi.org/10.1145/3287560.3287578
  • Scott et al. (2015) Kimberly A Scott, Kimberly M Sheridan, and Kevin Clark. 2015. Culturally responsive computing: a theory revisited. Learning, Media and Technology 40, 4 (2015), 412–436.
  • Shivayogi (2013) Preethi Shivayogi. 2013. Vulnerable population and methods for their safeguard. Perspect Clin Res. 4, 1 (Spring 2013), 53–57. https://doi.org/10.4103/2229-3485.106389
  • Speakman et al. (2018) Skyler Speakman, Srihari Sridharan, and Isaac Markus. 2018. Three population covariate shift for mobile phone-based credit scoring. In Proceedings of the 1st ACM SIGCAS Conference on Computing and Sustainable Societies. ACM, 20.
  • Stolte and Emerson (1977) John Stolte and Richard Emerson. 1977. Structural inequality: Position and power in network structures. Behavioral theory in sociology (1977), 117–38.
  • Tienda and Sullivan (2011) Marta Tienda and Teresa Sullivan. 2011. Texas Higher Education Opportunity Project. Ann Arbor, MI: Inter-university Consortium for Political and Social Research [distributor] (2011), 06–02.
  • Torchia et al. (2011) Mariateresa Torchia, Andrea Calabrò, and Morten Huse. 2011. Women directors on corporate boards: From tokenism to critical mass. Journal of business ethics 102, 2 (2011), 299–317.
  • Uhlmann and Cohen (2005) Eric Luis Uhlmann and Geoffrey L Cohen. 2005. Constructed criteria: Redefining merit to justify discrimination. Psychological Science 16, 6 (2005), 474–480.
  • Vakil and Ayers (2019) Sepehr Vakil and Rick Ayers. 2019. The racial politics of STEM education in the USA: interrogations and explorations.
  • Vakil et al. (2016) Sepehr Vakil, Maxine McKinney de Royston, Na’ilah Suad Nasir, and Ben Kirshner. 2016. Rethinking race and power in design-based research: Reflections from the field. Cognition and Instruction 34, 3 (2016), 194–209.
  • van den Hurk (2019) Anniek van den Hurk. 2019. Interventions in education to prevent STEM pipeline leakage. International Journal of Science Education 41 (Oct 2019 2019), 150–164. Issue 2. https://doi.org/10.1080/09500693.2018.1540897
  • Wachter et al. (2017) Sandra Wachter, Brent Mittelstadt, and Chris Russell. 2017. Counterfactual Explanations without Opening the Black Box: Automated Decisions and the GPDR. Harv. JL & Tech. 31 (2017), 841.
  • Wightman (1998) Linda F Wightman. 1998. LSAC National Longitudinal Bar Passage Study. LSAC Research Report Series. (1998).
  • Zweben and Bizot (2019) Stuart Zweben and Betsy Bizot. 2019. 2018 CRA Taulbee Survey. Computing Research News (2019), 4–74.