Sparse Interventions in Language Models with Differentiable Masking

12/13/2021
by   Nicola De Cao, et al.
4

There has been a lot of interest in understanding what information is captured by hidden representations of language models (LMs). Typically, interpretation methods i) do not guarantee that the model actually uses the encoded information, and ii) do not discover small subsets of neurons responsible for a considered phenomenon. Inspired by causal mediation analysis, we propose a method that discovers within a neural LM a small subset of neurons responsible for a particular linguistic phenomenon, i.e., subsets causing a change in the corresponding token emission probabilities. We use a differentiable relaxation to approximately search through the combinatorial space. An L_0 regularization term ensures that the search converges to discrete and sparse solutions. We apply our method to analyze subject-verb number agreement and gender bias detection in LSTMs. We observe that it is fast and finds better solutions than the alternative (REINFORCE). Our experiments confirm that each of these phenomenons is mediated through a small subset of neurons that do not play any other discernible role.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/26/2020

Causal Mediation Analysis for Interpreting Neural NLP: The Case of Gender Bias

Common methods for interpreting neural models in natural language proces...
research
10/06/2020

Analyzing Individual Neurons in Pre-trained Language Models

While a lot of analysis has been carried to demonstrate linguistic knowl...
research
10/25/2022

Causal Analysis of Syntactic Agreement Neurons in Multilingual Language Models

Structural probing work has found evidence for latent syntactic informat...
research
09/09/2023

Neurons in Large Language Models: Dead, N-gram, Positional

We analyze a family of large language models in such a lightweight manne...
research
10/14/2021

On the Pitfalls of Analyzing Individual Neurons in Language Models

While many studies have shown that linguistic information is encoded in ...
research
03/01/2023

Competence-Based Analysis of Language Models

Despite the recent success of large pretrained language models (LMs) on ...
research
04/30/2020

How do Decisions Emerge across Layers in Neural Models? Interpretation with Differentiable Masking

Attribution methods assess the contribution of inputs (e.g., words) to t...

Please sign up or login with your details

Forgot password? Click here to reset