Counteracts: Testing Stereotypical Representation in Pre-trained Language Models

01/11/2023

∙

Language models have demonstrated strong performance on various natural language understanding tasks. Similar to humans, language models could also have their own bias that is learned from the training data. As more and more downstream tasks integrate language models as part of the pipeline, it is necessary to understand the internal stereotypical representation and the methods to mitigate the negative effects. In this paper, we proposed a simple method to test the internal stereotypical representation in pre-trained language models using counterexamples. We mainly focused on gender bias, but the method can be extended to other types of bias. We evaluated models on 9 different cloze-style prompts consisting of knowledge and base prompts. Our results indicate that pre-trained language models show a certain amount of robustness when using unrelated knowledge, and prefer shallow linguistic cues, such as word position and syntactic structure, to alter the internal stereotypical representation. Such findings shed light on how to manipulate language models in a neutral approach for both finetuning and evaluation.

READ FULL TEXT

Counteracts: Testing Stereotypical Representation in Pre-trained Language Models

Probing Linguistic Information For Logical Inference In Pre-trained Language Models

What Changed? Investigating Debiasing Methods using Causal Mediation Analysis

Probing Pre-Trained Language Models for Disease Knowledge

Improving Authorship Verification using Linguistic Divergence

Sorting through the noise: Testing robustness of information processing in pre-trained language models

Language Models Understand Us, Poorly

Counterfactual reasoning: Do language models need world knowledge for causal understanding?

Counteracts: Testing Stereotypical Representation in Pre-trained Language Models

Related Research

Probing Linguistic Information For Logical Inference In Pre-trained Language Models

What Changed? Investigating Debiasing Methods using Causal Mediation Analysis

Probing Pre-Trained Language Models for Disease Knowledge

Improving Authorship Verification using Linguistic Divergence

Sorting through the noise: Testing robustness of information processing in pre-trained language models

Language Models Understand Us, Poorly

Counterfactual reasoning: Do language models need world knowledge for causal understanding?