TCAV: Relative concept importance testing with Linear Concept Activation Vectors

11/30/2017
by   Been Kim, et al.
0

Neural networks commonly offer high utility but remain difficult to interpret. Developing methods to explain their decisions is challenging due to their large size, complex structure, and inscrutable internal representations. This work argues that the language of explanations should be expanded from that of input features (e.g., assigning importance weightings to pixels) to include that of higher-level, human-friendly concepts. For example, an understandable explanation of why an image classifier outputs the label "zebra" would ideally relate to concepts such as "stripes" rather than a set of particular pixel values. This paper introduces the "concept activation vector" (CAV) which allows quantitative analysis of a concept's relative importance to classification, with a user-provided set of input data examples defining the concept. CAVs may be easily used by non-experts, who need only provide examples, and with CAVs the high-dimensional structure of neural networks turns into an aid to interpretation, rather than an obstacle. Using the domain of image classification as a testing ground, we describe how CAVs may be used to test hypotheses about classifiers and also generate insights into the deficiencies and correlations in training data. CAVs also provide us a directed approach to choose the combinations of neurons to visualize with the DeepDream technique, which traditionally has chosen neurons or linear combinations of neurons at random to visualize.

READ FULL TEXT

page 7

page 8

page 9

page 15

page 16

page 17

page 20

page 21

research
04/19/2023

Disentangling Neuron Representations with Concept Vectors

Mechanistic interpretability aims to understand how models store represe...
research
02/25/2022

Human-Centered Concept Explanations for Neural Networks

Understanding complex machine learning models such as deep neural networ...
research
02/10/2020

Adversarial TCAV – Robust and Effective Interpretation of Intermediate Layers in Neural Networks

Interpreting neural network decisions and the information learned in int...
research
06/20/2022

Neural Activation Patterns (NAPs): Visual Explainability of Learned Concepts

A key to deciphering the inner workings of neural networks is understand...
research
05/17/2023

Explain Any Concept: Segment Anything Meets Concept-Based Explanation

EXplainable AI (XAI) is an essential topic to improve human understandin...
research
04/05/2022

Improving Generalizability in Implicitly Abusive Language Detection with Concept Activation Vectors

Robustness of machine learning models on ever-changing real-world data i...
research
11/22/2022

Interpreting Neural Networks through the Polytope Lens

Mechanistic interpretability aims to explain what a neural network has l...

Please sign up or login with your details

Forgot password? Click here to reset