Don't trust your eyes: on the (un)reliability of feature visualizations

06/07/2023
by   Robert Geirhos, et al.
0

How do neural networks extract patterns from pixels? Feature visualizations attempt to answer this important question by visualizing highly activating patterns through optimization. Today, visualization methods form the foundation of our knowledge about the internal workings of neural networks, as a type of mechanistic interpretability. Here we ask: How reliable are feature visualizations? We start our investigation by developing network circuits that trick feature visualizations into showing arbitrary patterns that are completely disconnected from normal network behavior on natural input. We then provide evidence for a similar phenomenon occurring in standard, unmanipulated networks: feature visualizations are processed very differently from standard input, casting doubt on their ability to "explain" how neural networks process natural images. We underpin this empirical finding by theory proving that the set of functions that can be reliably understood by feature visualization is extremely small and does not include general black-box neural networks. Therefore, a promising way forward could be the development of networks that enforce certain structures in order to ensure more reliable feature visualizations.

READ FULL TEXT

page 1

page 3

page 4

page 5

page 6

research
10/23/2020

Exemplary Natural Images Explain CNN Activations Better than Feature Visualizations

Feature visualizations such as synthetic maximally activating images are...
research
06/11/2023

Unlocking Feature Visualization for Deeper Networks with MAgnitude Constrained Optimization

Feature visualization has gained substantial popularity, particularly af...
research
06/22/2023

Targeted Background Removal Creates Interpretable Feature Visualizations

Feature visualization is used to visualize learned features for black bo...
research
06/23/2021

How Well do Feature Visualizations Support Causal Understanding of CNN Activations?

One widely used approach towards understanding the inner workings of dee...
research
09/03/2019

Demystifying Brain Tumour Segmentation Networks: Interpretability and Uncertainty Analysis

The accurate automatic segmentation of gliomas and its intra-tumoral str...
research
09/03/2019

Illuminated Decision Trees with Lucid

The Lucid methods described by Olah et al. (2018) provide a way to inspe...
research
07/21/2020

Inverting the Feature Visualization Process for Feedforward Neural Networks

This work sheds light on the invertibility of feature visualization in n...

Please sign up or login with your details

Forgot password? Click here to reset