DeepAI AI Chat
Log In Sign Up

A Comprehensive Study of Gender Bias in Chemical Named Entity Recognition Models

by   Xingmeng Zhao, et al.

Objective. Chemical named entity recognition (NER) models have the potential to impact a wide range of downstream tasks, from identifying adverse drug reactions to general pharmacoepidemiology. However, it is unknown whether these models work the same for everyone. Performance disparities can potentially cause harm rather than the intended good. Hence, in this paper, we measure gender-related performance disparities of chemical NER systems. Materials and Methods. We develop a framework to measure gender bias in chemical NER models using synthetic data and a newly annotated dataset of over 92,405 words with self-identified gender information from Reddit. We applied and evaluated state-of-the-art biomedical NER models. Results. Our findings indicate that chemical NER models are biased. The results of the bias tests on the synthetic dataset and the real-world data multiple fairness issues. For example, for synthetic data, we find that female-related names are generally classified as chemicals, particularly in datasets containing many brand names rather than standard ones. For both datasets, we find consistent fairness issues resulting in substantial performance disparities between female- and male-related data. Discussion. Our study highlights the issue of biases in chemical NER models. For example, we find that many systems cannot detect contraceptives (e.g., birth control). Conclusion. Chemical NER models are biased and can be harmful to female-related groups. Therefore, practitioners should carefully consider the potential biases of these models and take steps to mitigate them.


page 1

page 2

page 3

page 4


Man is to Person as Woman is to Location: Measuring Gender Bias in Named Entity Recognition

We study the bias in several state-of-the-art named entity recognition (...

Assessing Demographic Bias in Named Entity Recognition

Named Entity Recognition (NER) is often the first step towards automated...

Improving Chemical Named Entity Recognition in Patents with Contextualized Word Embeddings

Chemical patents are an important resource for chemical information. How...

BioRED: A Comprehensive Biomedical Relation Extraction Dataset

Automated relation extraction (RE) from biomedical literature is critica...

De-biasing Distantly Supervised Named Entity Recognition via Causal Intervention

Distant supervision tackles the data bottleneck in NER by automatically ...

Improving Tagging Consistency and Entity Coverage for Chemical Identification in Full-text Articles

This paper is a technical report on our system submitted to the chemical...

Context-aware Adversarial Training for Name Regularity Bias in Named Entity Recognition

In this work, we examine the ability of NER models to use contextual inf...