How explainable are adversarially-robust CNNs?

05/25/2022
by   Mehdi Nourelahi, et al.
7

Three important criteria of existing convolutional neural networks (CNNs) are (1) test-set accuracy; (2) out-of-distribution accuracy; and (3) explainability. While these criteria have been studied independently, their relationship is unknown. For example, do CNNs that have a stronger out-of-distribution performance have also stronger explainability? Furthermore, most prior feature-importance studies only evaluate methods on 2-3 common vanilla ImageNet-trained CNNs, leaving it unknown how these methods generalize to CNNs of other architectures and training algorithms. Here, we perform the first, large-scale evaluation of the relations of the three criteria using 9 feature-importance methods and 12 ImageNet-trained CNNs that are of 3 training algorithms and 5 CNN architectures. We find several important insights and recommendations for ML practitioners. First, adversarially robust CNNs have a higher explainability score on gradient-based attribution methods (but not CAM-based or perturbation-based methods). Second, AdvProp models, despite being highly accurate more than both vanilla and robust models alone, are not superior in explainability. Third, among 9 feature attribution methods tested, GradCAM and RISE are consistently the best methods. Fourth, Insertion and Deletion are biased towards vanilla and robust models respectively, due to their strong correlation with the confidence score distributions of a CNN. Fifth, we did not find a single CNN to be the best in all three criteria, which interestingly suggests that CNNs are harder to interpret as they become more accurate.

READ FULL TEXT

page 2

page 21

page 29

page 30

research
04/12/2021

A-FMI: Learning Attributions from Deep Networks via Feature Map Importance

Gradient-based attribution methods can aid in the understanding of convo...
research
03/15/2023

EvalAttAI: A Holistic Approach to Evaluating Attribution Maps in Robust and Non-Robust Models

The expansion of explainable artificial intelligence as a field of resea...
research
02/19/2020

Interpreting Interpretations: Organizing Attribution Methods by Criteria

Attribution methods that explains the behaviour of machine learning mode...
research
07/18/2023

Gradient strikes back: How filtering out high frequencies improves explanations

Recent years have witnessed an explosion in the development of novel pre...
research
12/07/2020

A Singular Value Perspective on Model Robustness

Convolutional Neural Networks (CNNs) have made significant progress on s...
research
04/28/2023

Evaluating the Stability of Semantic Concept Representations in CNNs for Robust Explainability

Analysis of how semantic concepts are represented within Convolutional N...
research
11/16/2022

Using explainability to design physics-aware CNNs for solving subsurface inverse problems

We present a novel method of using explainability techniques to design p...

Please sign up or login with your details

Forgot password? Click here to reset