Avoiding bias when inferring race using name-based approaches

04/14/2021
by   Diego Kozlowski, et al.
0

Racial disparity in academia is a widely acknowledged problem. The quantitative understanding of racial-based systemic inequalities is an important step towards a more equitable research system. However, few large-scale analyses have been performed on this topic, mostly because of the lack of robust race-disambiguation algorithms. Identifying author information does not generally include the author's race. Therefore, an algorithm needs to be employed, using known information about authors, i.e., their names, to infer their perceived race. Nevertheless, as any other algorithm, the process of racial inference can generate biases if it is not carefully considered. When the research is focused on the understanding of racial-based inequalities, such biases undermine the objectives of the investigation and may perpetuate inequities. The goal of this article is to assess the biases introduced by the different approaches used name-based racial inference. We use information from US census and mortgage applications to infer the race of US author names in the Web of Science. We estimate the effects of using given and family names, thresholds or continuous distributions, and imputation. Our results demonstrate that the validity of name-based inference varies by race and ethnicity and that threshold approaches underestimate Black authors and overestimate White authors. We conclude with recommendations to avoid potential biases. This article fills an important research gap that will allow more systematic and unbiased studies on racial disparity in science.

READ FULL TEXT
research
08/26/2022

Race and ethnicity data for first, middle, and last names

We provide the largest compiled publicly available dictionaries of first...
research
08/13/2022

A Study of Demographic Bias in CNN-based Brain MR Segmentation

Convolutional neural networks (CNNs) are increasingly being used to auto...
research
11/27/2018

Fairness Under Unawareness: Assessing Disparity When Protected Class Is Unobserved

Assessing the fairness of a decision making system with respect to a pro...
research
05/12/2022

Addressing Census data problems in race imputation via fully Bayesian Improved Surname Geocoding and name supplements

Prediction of an individual's race and ethnicity plays an important role...
research
04/18/2023

BISG: When inferring race or ethnicity, does it matter that people often live near their relatives?

Bayesian Improved Surname Geocoding (BISG) is a ubiquitous tool for pred...
research
12/07/2021

raceBERT – A Transformer-based Model for Predicting Race and Ethnicity from Names

This paper presents raceBERT – a transformer-based model for predicting ...
research
05/05/2018

Predicting Race and Ethnicity From the Sequence of Characters in a Name

To answer questions about racial inequality, we often need a way to infe...

Please sign up or login with your details

Forgot password? Click here to reset