Counteracts: Testing Stereotypical Representation in Pre-trained Language Models

01/11/2023
by   Damin Zhang, et al.
0

Language models have demonstrated strong performance on various natural language understanding tasks. Similar to humans, language models could also have their own bias that is learned from the training data. As more and more downstream tasks integrate language models as part of the pipeline, it is necessary to understand the internal stereotypical representation and the methods to mitigate the negative effects. In this paper, we proposed a simple method to test the internal stereotypical representation in pre-trained language models using counterexamples. We mainly focused on gender bias, but the method can be extended to other types of bias. We evaluated models on 9 different cloze-style prompts consisting of knowledge and base prompts. Our results indicate that pre-trained language models show a certain amount of robustness when using unrelated knowledge, and prefer shallow linguistic cues, such as word position and syntactic structure, to alter the internal stereotypical representation. Such findings shed light on how to manipulate language models in a neutral approach for both finetuning and evaluation.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/03/2021

Probing Linguistic Information For Logical Inference In Pre-trained Language Models

Progress in pre-trained language models has led to a surge of impressive...
research
06/01/2022

What Changed? Investigating Debiasing Methods using Causal Mediation Analysis

Previous work has examined how debiasing language models affect downstre...
research
06/14/2021

Probing Pre-Trained Language Models for Disease Knowledge

Pre-trained language models such as ClinicalBERT have achieved impressiv...
research
03/12/2021

Improving Authorship Verification using Linguistic Divergence

We propose an unsupervised solution to the Authorship Verification task ...
research
09/25/2021

Sorting through the noise: Testing robustness of information processing in pre-trained language models

Pre-trained LMs have shown impressive performance on downstream NLP task...
research
10/19/2022

Language Models Understand Us, Poorly

Some claim language models understand us. Others won't hear it. To clari...
research
12/06/2022

Counterfactual reasoning: Do language models need world knowledge for causal understanding?

Current pre-trained language models have enabled remarkable improvements...

Please sign up or login with your details

Forgot password? Click here to reset