Adversarial Examples Generation for Reducing Implicit Gender Bias in Pre-trained Models

10/03/2021
by   Wenqian Ye, et al.
0

Over the last few years, Contextualized Pre-trained Neural Language Models, such as BERT, GPT, have shown significant gains in various NLP tasks. To enhance the robustness of existing pre-trained models, one way is adversarial examples generation and evaluation for conducting data augmentation or adversarial learning. In the meanwhile, gender bias embedded in the models seems to be a serious problem in practical applications. Many researches have covered the gender bias produced by word-level information(e.g. gender-stereotypical occupations), while few researchers have investigated the sentence-level cases and implicit cases. In this paper, we proposed a method to automatically generate implicit gender bias samples at sentence-level and a metric to measure gender bias. Samples generated by our method will be evaluated in terms of accuracy. The metric will be used to guide the generation of examples from Pre-trained models. Therefore, those examples could be used to impose attacks on Pre-trained Models. Finally, we discussed the evaluation efficacy of our generated examples on reducing gender bias for future research.

READ FULL TEXT
research
06/07/2023

Language Models Get a Gender Makeover: Mitigating Gender Bias with Few-Shot Data Interventions

Societal biases present in pre-trained large language models are a criti...
research
04/29/2020

Revisiting Pre-Trained Models for Chinese Natural Language Processing

Bidirectional Encoder Representations from Transformers (BERT) has shown...
research
05/14/2022

Naturalistic Causal Probing for Morpho-Syntax

Probing has become a go-to methodology for interpreting and analyzing de...
research
10/11/2021

Improving Gender Fairness of Pre-Trained Language Models without Catastrophic Forgetting

Although pre-trained language models, such as BERT, achieve state-of-art...
research
01/28/2023

Comparing Intrinsic Gender Bias Evaluation Measures without using Human Annotated Examples

Numerous types of social biases have been identified in pre-trained lang...
research
04/17/2020

Unsupervised Discovery of Implicit Gender Bias

Despite their prevalence in society, social biases are difficult to defi...
research
10/12/2020

Measuring and Reducing Gendered Correlations in Pre-trained Models

Pre-trained models have revolutionized natural language understanding. H...

Please sign up or login with your details

Forgot password? Click here to reset