Towards falsifiable interpretability research

10/22/2020
by   Matthew L. Leavitt, et al.
0

Methods for understanding the decisions of and mechanisms underlying deep neural networks (DNNs) typically rely on building intuition by emphasizing sensory or semantic features of individual examples. For instance, methods aim to visualize the components of an input which are "important" to a network's decision, or to measure the semantic properties of single neurons. Here, we argue that interpretability research suffers from an over-reliance on intuition-based approaches that risk-and in some cases have caused-illusory progress and misleading conclusions. We identify a set of limitations that we argue impede meaningful progress in interpretability research, and examine two popular classes of interpretability methods-saliency and single-neuron-based approaches-that serve as case studies for how overreliance on intuition and lack of falsifiability can undermine interpretability research. To address these concerns, we propose a strategy to address these impediments in the form of a framework for strongly falsifiable interpretability research. We encourage researchers to use their intuitions as a starting point to develop and test clear, falsifiable hypotheses, and hope that our framework yields robust, evidence-based interpretability methods that generate meaningful advances in our understanding of DNNs.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/12/2019

New Perspective of Interpretability of Deep Neural Networks

Deep neural networks (DNNs) are known as black-box models. In other word...
research
01/25/2019

Towards Interpretable Deep Neural Networks by Leveraging Adversarial Examples

Sometimes it is not enough for a DNN to produce an outcome. For example,...
research
09/25/2019

Switched linear projections and inactive state sensitivity for deep neural network interpretability

We introduce switched linear projections for expressing the activity of ...
research
01/12/2023

Progress measures for grokking via mechanistic interpretability

Neural networks often exhibit emergent behavior, where qualitatively new...
research
03/12/2017

Improving Interpretability of Deep Neural Networks with Semantic Information

Interpretability of deep neural networks (DNNs) is essential since it en...
research
01/22/2021

i-Algebra: Towards Interactive Interpretability of Deep Neural Networks

Providing explanations for deep neural networks (DNNs) is essential for ...

Please sign up or login with your details

Forgot password? Click here to reset