Gender bias in (non)-contextual clinical word embeddings for stereotypical medical categories

08/02/2022
by   Gizem Soğancıoğlu, et al.
0

Clinical word embeddings are extensively used in various Bio-NLP problems as a state-of-the-art feature vector representation. Although they are quite successful at the semantic representation of words, due to the dataset - which potentially carries statistical and societal bias - on which they are trained, they might exhibit gender stereotypes. This study analyses gender bias of clinical embeddings on three medical categories: mental disorders, sexually transmitted diseases, and personality traits. To this extent, we analyze two different pre-trained embeddings namely (contextualized) clinical-BERT and (non-contextualized) BioWordVec. We show that both embeddings are biased towards sensitive gender groups but BioWordVec exhibits a higher bias than clinical-BERT for all three categories. Moreover, our analyses show that clinical embeddings carry a high degree of bias for some medical terms and diseases which is conflicting with medical literature. Having such an ill-founded relationship might cause harm in downstream applications that use clinical embeddings.

READ FULL TEXT
research
12/15/2022

The effects of gender bias in word embeddings on depression prediction

Word embeddings are extensively used in various NLP problems as a state-...
research
03/11/2020

Hurtful Words: Quantifying Biases in Clinical Contextual Word Embeddings

In this work, we examine the extent to which embeddings may encode margi...
research
05/03/2020

Double-Hard Debias: Tailoring Word Embeddings for Gender Bias Mitigation

Word embeddings derived from human-generated corpora inherit strong gend...
research
03/24/2020

Can Embeddings Adequately Represent Medical Terminology? New Large-Scale Medical Term Similarity Datasets Have the Answer!

A large number of embeddings trained on medical data have emerged, but i...
research
08/07/2019

Debiasing Embeddings for Reduced Gender Bias in Text Classification

(Bolukbasi et al., 2016) demonstrated that pretrained word embeddings ca...
research
06/01/2022

Assessing Group-level Gender Bias in Professional Evaluations: The Case of Medical Student End-of-Shift Feedback

Although approximately 50 female physicians tend to be underrepresented ...
research
11/19/2020

Exploring Text Specific and Blackbox Fairness Algorithms in Multimodal Clinical NLP

Clinical machine learning is increasingly multimodal, collected in both ...

Please sign up or login with your details

Forgot password? Click here to reset