TIDE: Textual Identity Detection for Evaluating and Augmenting Classification and Language Models

09/07/2023
by   Emmanuel Klu, et al.
0

Machine learning models can perpetuate unintended biases from unfair and imbalanced datasets. Evaluating and debiasing these datasets and models is especially hard in text datasets where sensitive attributes such as race, gender, and sexual orientation may not be available. When these models are deployed into society, they can lead to unfair outcomes for historically underrepresented groups. In this paper, we present a dataset coupled with an approach to improve text fairness in classifiers and language models. We create a new, more comprehensive identity lexicon, TIDAL, which includes 15,123 identity terms and associated sense context across three demographic categories. We leverage TIDAL to develop an identity annotation and augmentation tool that can be used to improve the availability of identity context and the effectiveness of ML fairness techniques. We evaluate our approaches using human contributors, and additionally run experiments focused on dataset and model debiasing. Results show our assistive annotation technique improves the reliability and velocity of human-in-the-loop processes. Our dataset and methods uncover more disparities during evaluation, and also produce more fair models during remediation. These approaches provide a practical path forward for scaling classifier and generative model fairness in real-world settings.

READ FULL TEXT

page 6

page 13

research
10/19/2022

Towards Procedural Fairness: Uncovering Biases in How a Toxic Language Classifier Uses Sentiment Information

Previous works on the fairness of toxic language classifiers compare the...
research
03/18/2023

DeAR: Debiasing Vision-Language Models with Additive Residuals

Large pre-trained vision-language models (VLMs) reduce the time for deve...
research
05/27/2022

Subverting machines, fluctuating identities: Re-learning human categorization

Most machine learning systems that interact with humans construct some n...
research
03/01/2023

Fairness Evaluation in Text Classification: Machine Learning Practitioner Perspectives of Individual and Group Fairness

Mitigating algorithmic bias is a critical task in the development and de...
research
06/24/2021

Towards Understanding and Mitigating Social Biases in Language Models

As machine learning methods are deployed in real-world settings such as ...
research
06/15/2023

Harvard Glaucoma Fairness: A Retinal Nerve Disease Dataset for Fairness Learning and Fair Identity Normalization

Fairness in machine learning is important for societal well-being, but l...
research
12/31/2020

Fairness in Machine Learning

Machine learning based systems are reaching society at large and in many...

Please sign up or login with your details

Forgot password? Click here to reset