Can We Trust Race Prediction?

07/17/2023
by   Cangyuan Li, et al.
0

In the absence of sensitive race and ethnicity data, researchers, regulators, and firms alike turn to proxies. In this paper, I train a Bidirectional Long Short-Term Memory (BiLSTM) model on a novel dataset of voter registration data from all 50 US states and create an ensemble that achieves up to 36.8 out of sample (OOS) F1 scores than the best performing machine learning models in the literature. Additionally, I construct the most comprehensive database of first and surname distributions in the US in order to improve the coverage and accuracy of Bayesian Improved Surname Geocoding (BISG) and Bayesian Improved Firstname Surname Geocoding (BIFSG). Finally, I provide the first high-quality benchmark dataset in order to fairly compare existing models and aid future model developers.

READ FULL TEXT

page 9

page 11

page 14

page 18

page 23

research
06/26/2022

Benchmarking Bayesian Improved Surname Geocoding Against Machine Learning Methods

Bayesian Improved Surname Geocoding (BISG) is the most popular method fo...
research
05/05/2018

Predicting Race and Ethnicity From the Sequence of Characters in a Name

To answer questions about racial inequality, we often need a way to infe...
research
12/26/2020

Predicting Organizational Cybersecurity Risk: A Deep Learning Approach

Cyberattacks conducted by malicious hackers cause irreparable damage to ...
research
10/30/2019

Quantum Optical Experiments Modeled by Long Short-Term Memory

We demonstrate how machine learning is able to model experiments in quan...
research
01/09/2020

Binary and Multitask Classification Model for Dutch Anaphora Resolution: Die/Dat Prediction

The correct use of Dutch pronouns 'die' and 'dat' is a stumbling block f...
research
07/22/2018

Rapid Autonomous Car Control based on Spatial and Temporal Visual Cues

We present a novel approach to modern car control utilizing a combinatio...
research
12/07/2021

Towards a Shared Rubric for Dataset Annotation

When arranging for third-party data annotation, it can be hard to compar...

Please sign up or login with your details

Forgot password? Click here to reset