How do Decisions Emerge across Layers in Neural Models? Interpretation with Differentiable Masking

04/30/2020
by   Nicola De Cao, et al.
27

Attribution methods assess the contribution of inputs (e.g., words) to the model prediction. One way to do so is erasure: a subset of inputs is considered irrelevant if it can be removed without affecting the model prediction. Despite its conceptual simplicity, erasure is not commonly used in practice. First, the objective is generally intractable, and approximate search or leave-one-out estimates are typically used instead; both approximations may be inaccurate and remain very expensive with modern deep (e.g., BERT-based) NLP models. Second, the method is susceptible to the hindsight bias: the fact that a token can be dropped does not mean that the model `knows' it can be dropped. The resulting pruning is over-aggressive and does not reflect how the model arrives at the prediction. To deal with these two challenges, we introduce Differentiable Masking. DiffMask relies on learning sparse stochastic gates (i.e., masks) to completely mask-out subsets of the input while maintaining end-to-end differentiability. The decision to include or disregard an input token is made with a simple linear model based on intermediate hidden layers of the analyzed model. First, this makes the approach efficient at test time because we predict rather than search. Second, as with probing classifiers, this reveals what the network `knows' at the corresponding layers. This lets us not only plot attribution heatmaps but also analyze how decisions are formed across network layers. We use DiffMask to study BERT models on sentiment classification and question answering.

READ FULL TEXT

page 2

page 9

page 10

page 12

page 19

research
06/26/2023

Constraint-aware and Ranking-distilled Token Pruning for Efficient Transformer Inference

Deploying pre-trained transformer models like BERT on downstream tasks i...
research
04/03/2021

Exploring the Role of BERT Token Representations to Explain Sentence Probing Results

Several studies have been carried out on revealing linguistic features c...
research
06/03/2021

Dissecting Generation Modes for Abstractive Summarization Models via Ablation and Attribution

Despite the prominence of neural abstractive summarization models, we kn...
research
05/03/2022

Finding patterns in Knowledge Attribution for Transformers

We analyze the Knowledge Neurons framework for the attribution of factua...
research
10/14/2019

Pruning a BERT-based Question Answering Model

We investigate compressing a BERT-based question answering system by pru...
research
10/22/2021

Double Trouble: How to not explain a text classifier's decisions using counterfactuals synthesized by masked language models?

Explaining how important each input feature is to a classifier's decisio...
research
12/13/2021

Sparse Interventions in Language Models with Differentiable Masking

There has been a lot of interest in understanding what information is ca...

Please sign up or login with your details

Forgot password? Click here to reset