Causal Mediation Analysis for Interpreting Neural NLP: The Case of Gender Bias

04/26/2020
by   Jesse Vig, et al.
0

Common methods for interpreting neural models in natural language processing typically examine either their structure or their behavior, but not both. We propose a methodology grounded in the theory of causal mediation analysis for interpreting which parts of a model are causally implicated in its behavior. It enables us to analyze the mechanisms by which information flows from input to output through various model components, known as mediators. We apply this methodology to analyze gender bias in pre-trained Transformer language models. We study the role of individual neurons and attention heads in mediating gender bias across three datasets designed to gauge a model's sensitivity to gender bias. Our mediation analysis reveals that gender bias effects are (i) sparse, concentrated in a small part of the network; (ii) synergistic, amplified or repressed by different components; and (iii) decomposable into effects flowing directly from the input and indirectly through the mediators.

READ FULL TEXT

page 8

page 18

research
05/14/2022

Naturalistic Causal Probing for Morpho-Syntax

Probing has become a go-to methodology for interpreting and analyzing de...
research
12/13/2021

Sparse Interventions in Language Models with Differentiable Masking

There has been a lot of interest in understanding what information is ca...
research
07/31/2018

Gender Bias in Neural Natural Language Processing

We examine whether neural natural language processing (NLP) systems refl...
research
12/09/2021

Word Embeddings via Causal Inference: Gender Bias Reducing and Semantic Information Preserving

With widening deployments of natural language processing (NLP) in daily ...
research
10/15/2021

Detecting Gender Bias in Transformer-based Models: A Case Study on BERT

In this paper, we propose a novel gender bias detection method by utiliz...
research
07/16/2021

Intersectional Bias in Causal Language Models

To examine whether intersectional bias can be observed in language gener...
research
06/28/2022

Towards Lexical Gender Inference: A Scalable Methodology using Online Databases

This paper presents a new method for automatically detecting words with ...

Please sign up or login with your details

Forgot password? Click here to reset