raceBERT – A Transformer-based Model for Predicting Race and Ethnicity from Names

12/07/2021
by   Prasanna Parasurama, et al.
1

This paper presents raceBERT – a transformer-based model for predicting race and ethnicity from character sequences in names, and an accompanying python package. Using a transformer-based model trained on a U.S. Florida voter registration dataset, the model predicts the likelihood of a name belonging to 5 U.S. census race categories (White, Black, Hispanic, Asian Pacific Islander, American Indian Alaskan Native). I build on Sood and Laohaprapanon (2018) by replacing their LSTM model with transformer-based models (pre-trained BERT model, and a roBERTa model trained from scratch), and compare the results. To the best of my knowledge, raceBERT achieves state-of-the-art results in race prediction using names, with an average f1-score of 0.86 – a 4.1 over the previous state-of-the-art, and improvements between 15-17 non-white names.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/19/2021

Rethnicity: Predicting Ethnicity from Names

I provide an R package, , for predicting ethnicity from names. I use the...
research
03/13/2023

Instate: Predicting the State of Residence From Last Name

India has twenty-two official languages. Serving such a diverse language...
research
05/05/2018

Predicting Race and Ethnicity From the Sequence of Characters in a Name

To answer questions about racial inequality, we often need a way to infe...
research
08/26/2022

Race and ethnicity data for first, middle, and last names

We provide the largest compiled publicly available dictionaries of first...
research
12/12/2022

Automated ICD Coding using Extreme Multi-label Long Text Transformer-based Models

Background: Encouraged by the success of pretrained Transformer models i...
research
12/25/2021

An Ensemble of Pre-trained Transformer Models For Imbalanced Multiclass Malware Classification

Classification of malware families is crucial for a comprehensive unders...
research
04/14/2021

Avoiding bias when inferring race using name-based approaches

Racial disparity in academia is a widely acknowledged problem. The quant...

Please sign up or login with your details

Forgot password? Click here to reset