Interpretable bias mitigation for textual data: Reducing gender bias in patient notes while maintaining classification performance

03/10/2021
by   Joshua R. Minot, et al.
0

Medical systems in general, and patient treatment decisions and outcomes in particular, are affected by bias based on gender and other demographic elements. As language models are increasingly applied to medicine, there is a growing interest in building algorithmic fairness into processes impacting patient care. Much of the work addressing this question has focused on biases encoded in language models – statistical estimates of the relationships between concepts derived from distant reading of corpora. Building on this work, we investigate how word choices made by healthcare practitioners and language models interact with regards to bias. We identify and remove gendered language from two clinical-note datasets and describe a new debiasing procedure using BERT-based gender classifiers. We show minimal degradation in health condition classification tasks for low- to medium-levels of bias removal via data augmentation. Finally, we compare the bias semantically encoded in the language models with the bias empirically observed in health records. This work outlines an interpretable approach for using data augmentation to identify and reduce the potential for bias in natural language processing pipelines.

READ FULL TEXT

page 8

page 12

research
05/30/2019

Reducing Gender Bias in Word-Level Language Models with a Gender-Equalizing Loss Function

Gender bias exists in natural language datasets which neural language mo...
research
07/18/2023

Unveiling Gender Bias in Terms of Profession Across LLMs: Analyzing and Addressing Sociological Implications

Gender bias in artificial intelligence (AI) and natural language process...
research
01/01/2023

CORGI-PM: A Chinese Corpus For Gender Bias Probing and Mitigation

As natural language processing (NLP) for gender bias becomes a significa...
research
05/06/2023

Algorithmic Bias, Generalist Models,and Clinical Medicine

The technical landscape of clinical machine learning is shifting in ways...
research
11/04/2019

Understanding racial bias in health using the Medical Expenditure Panel Survey data

Over the years, several studies have demonstrated that there exist signi...
research
06/05/2023

PULSAR: Pre-training with Extracted Healthcare Terms for Summarising Patients' Problems and Data Augmentation with Black-box Large Language Models

Medical progress notes play a crucial role in documenting a patient's ho...
research
05/30/2022

Parameter Efficient Diff Pruning for Bias Mitigation

In recent years language models have achieved state of the art performan...

Please sign up or login with your details

Forgot password? Click here to reset