Causal Analysis for Robust Interpretability of Neural Networks

05/15/2023
by   Ola Ahmad, et al.
0

Interpreting the inner function of neural networks is crucial for the trustworthy development and deployment of these black-box models. Prior interpretability methods focus on correlation-based measures to attribute model decisions to individual examples. However, these measures are susceptible to noise and spurious correlations encoded in the model during the training phase (e.g., biased inputs, model overfitting, or misspecification). Moreover, this process has proven to result in noisy and unstable attributions that prevent any transparent understanding of the model's behavior. In this paper, we develop a robust interventional-based method grounded by causal analysis to capture cause-effect mechanisms in pre-trained neural networks and their relation to the prediction. Our novel approach relies on path interventions to infer the causal mechanisms within hidden layers and isolate relevant and necessary information (to model prediction), avoiding noisy ones. The result is task-specific causal explanatory graphs that can audit model behavior and express the actual causes underlying its performance. We apply our method to vision models trained on classification tasks. On image classification tasks, we provide extensive quantitative experiments to show that our approach can capture more stable and faithful explanations than standard attribution-based methods. Furthermore, the underlying causal graphs reveal the neural interactions in the model, making it a valuable tool in other applications (e.g., model repair).

READ FULL TEXT

page 2

page 5

research
08/01/2020

A Causal Lens for Peeking into Black Box Predictive Models: Predictive Model Interpretation via Causal Attribution

With the increasing adoption of predictive models trained using machine ...
research
06/03/2020

Explaining The Behavior Of Black-Box Prediction Algorithms With Causal Learning

We propose to explain the behavior of black-box prediction methods (e.g....
research
06/06/2021

Causal Abstractions of Neural Networks

Structural analysis methods (e.g., probing and feature attribution) are ...
research
05/15/2023

Interpretability at Scale: Identifying Causal Mechanisms in Alpaca

Obtaining human-interpretable explanations of large, general-purpose lan...
research
06/28/2020

Causal Explanations of Image Misclassifications

The causal explanation of image misclassifications is an understudied ni...
research
07/14/2022

Causal Graphs Underlying Generative Models: Path to Learning with Limited Data

Training generative models that capture rich semantics of the data and i...
research
08/27/2023

Towards Vision-Language Mechanistic Interpretability: A Causal Tracing Tool for BLIP

Mechanistic interpretability seeks to understand the neural mechanisms t...

Please sign up or login with your details

Forgot password? Click here to reset