Race and ethnicity data for first, middle, and last names

08/26/2022
by   Evan T. R. Rosenman, et al.
0

We provide the largest compiled publicly available dictionaries of first, middle, and last names for the purpose of imputing race and ethnicity using, for example, Bayesian Improved Surname Geocoding (BISG). The dictionaries are based on the voter files of six Southern states that collect self-reported racial data upon voter registration. Our data cover a much larger scope of names than any comparable dataset, containing roughly one million first names, 1.1 million middle names, and 1.4 million surnames. Individuals are categorized into five mutually exclusive racial and ethnic groups – White, Black, Hispanic, Asian, and Other – and racial/ethnic counts by name are provided for every name in each dictionary. Counts can then be normalized row-wise or column-wise to obtain conditional probabilities of race given name or name given race. These conditional probabilities can then be deployed for imputation in a data analytic task for which ground truth racial and ethnic data is not available.

READ FULL TEXT

page 4

page 7

research
05/12/2022

Addressing Census data problems in race imputation via fully Bayesian Improved Surname Geocoding and name supplements

Prediction of an individual's race and ethnicity plays an important role...
research
05/05/2018

Predicting Race and Ethnicity From the Sequence of Characters in a Name

To answer questions about racial inequality, we often need a way to infe...
research
04/14/2021

Avoiding bias when inferring race using name-based approaches

Racial disparity in academia is a widely acknowledged problem. The quant...
research
12/07/2021

raceBERT – A Transformer-based Model for Predicting Race and Ethnicity from Names

This paper presents raceBERT – a transformer-based model for predicting ...
research
06/26/2022

Benchmarking Bayesian Improved Surname Geocoding Against Machine Learning Methods

Bayesian Improved Surname Geocoding (BISG) is the most popular method fo...
research
09/19/2021

Rethnicity: Predicting Ethnicity from Names

I provide an R package, , for predicting ethnicity from names. I use the...
research
11/16/2020

The Person Index Challenge: Extraction of Persons from Messy, Short Texts

When persons are mentioned in texts with their first name, last name and...

Please sign up or login with your details

Forgot password? Click here to reset