Behind the Mask: Demographic bias in name detection for PII masking

05/09/2022
by   Courtney Mansfield, et al.
0

Many datasets contain personally identifiable information, or PII, which poses privacy risks to individuals. PII masking is commonly used to redact personal information such as names, addresses, and phone numbers from text data. Most modern PII masking pipelines involve machine learning algorithms. However, these systems may vary in performance, such that individuals from particular demographic groups bear a higher risk for having their personal information exposed. In this paper, we evaluate the performance of three off-the-shelf PII masking systems on name detection and redaction. We generate data using names and templates from the customer service domain. We find that an open-source RoBERTa-based system shows fewer disparities than the commercial models we test. However, all systems demonstrate significant differences in error rate based on demographics. In particular, the highest error rates occurred for names associated with Black and Asian/Pacific Islander individuals.

READ FULL TEXT
research
05/18/2023

In the Name of Fairness: Assessing the Bias in Clinical Record De-identification

Data sharing is crucial for open science and reproducible research, but ...
research
08/14/2023

Using Text Injection to Improve Recognition of Personal Identifiers in Speech

Accurate recognition of specific categories, such as persons' names, dat...
research
12/29/2017

Personal Names in Modern Turkey

We analyzed the most common 5000 male and 5000 female Turkish names base...
research
01/22/2021

HANA: A HAndwritten NAme Database for Offline Handwritten Text Recognition

Methods for linking individuals across historical data sets, typically i...
research
02/03/2021

BiasFinder: Metamorphic Test Generation to Uncover Bias for Sentiment Analysis Systems

Artificial Intelligence (AI) software systems, such as Sentiment Analysi...
research
02/14/2023

Same Same, But Different: Conditional Multi-Task Learning for Demographic-Specific Toxicity Detection

Algorithmic bias often arises as a result of differential subgroup valid...
research
10/27/2020

It's All in the Name: A Character Based Approach To Infer Religion

Demographic inference from text has received a surge of attention in the...

Please sign up or login with your details

Forgot password? Click here to reset