Disentangled Explanations of Neural Network Predictions by Finding Relevant Subspaces

12/30/2022
by   Pattarawat Chormai, et al.
0

Explainable AI transforms opaque decision strategies of ML models into explanations that are interpretable by the user, for example, identifying the contribution of each input feature to the prediction at hand. Such explanations, however, entangle the potentially multiple factors that enter into the overall complex decision strategy. We propose to disentangle explanations by finding relevant subspaces in activation space that can be mapped to more abstract human-understandable concepts and enable a joint attribution on concepts and input features. To automatically extract the desired representation, we propose new subspace analysis formulations that extend the principle of PCA and subspace analysis to explanations. These novel analyses, which we call principal relevant component analysis (PRCA) and disentangled relevant subspace analysis (DRSA), optimize relevance of projected activations rather than the more traditional variance or kurtosis. This enables a much stronger focus on subspaces that are truly relevant for the prediction and the explanation, in particular, ignoring activations or concepts to which the prediction model is invariant. Our approach is general enough to work alongside common attribution techniques such as Shapley Value, Integrated Gradients, or LRP. Our proposed methods show to be practically useful and compare favorably to the state of the art as demonstrated on benchmarks and three use cases.

READ FULL TEXT

page 2

page 10

page 13

page 28

page 32

page 34

page 35

page 36

research
06/07/2022

From "Where" to "What": Towards Human-Understandable Explanations through Concept Relevance Propagation

The emerging field of eXplainable Artificial Intelligence (XAI) aims to ...
research
03/11/2022

Sparse Subspace Clustering for Concept Discovery (SSCCD)

Concepts are key building blocks of higher level human understanding. Ex...
research
11/21/2022

Revealing Hidden Context Bias in Segmentation and Object Detection through Concept-specific Explanations

Applying traditional post-hoc attribution methods to segmentation or obj...
research
09/21/2021

Interpretable Directed Diversity: Leveraging Model Explanations for Iterative Crowd Ideation

Feedback can help crowdworkers to improve their ideations. However, curr...
research
05/11/2021

Rationalization through Concepts

Automated predictions require explanations to be interpretable by humans...
research
06/15/2023

Improving Explainability of Disentangled Representations using Multipath-Attribution Mappings

Explainable AI aims to render model behavior understandable by humans, w...
research
07/14/2023

Visual Explanations with Attributions and Counterfactuals on Time Series Classification

With the rising necessity of explainable artificial intelligence (XAI), ...

Please sign up or login with your details

Forgot password? Click here to reset