-
On the Robustness of the CVPR 2018 White-Box Adversarial Example Defenses
Neural networks are known to be vulnerable to adversarial examples. In t...
read it
-
IS-CAM: Integrated Score-CAM for axiomatic-based explanations
Convolutional Neural Networks have been known as black-box models as hum...
read it
-
Don't Paint It Black: White-Box Explanations for Deep Learning in Computer Security
Deep learning is increasingly used as a basic building block of security...
read it
-
Camera Model Anonymisation with Augmented cGANs
The model of camera that was used to capture a particular photographic i...
read it
-
On Baselines for Local Feature Attributions
High-performing predictive models, such as neural nets, usually operate ...
read it
-
RISE: Randomized Input Sampling for Explanation of Black-box Models
Deep neural networks are increasingly being used to automate data analys...
read it
-
Black Box to White Box: Discover Model Characteristics Based on Strategic Probing
In Machine Learning, White Box Adversarial Attacks rely on knowing under...
read it
Evaluating Attribution Methods using White-Box LSTMs
Interpretability methods for neural networks are difficult to evaluate because we do not understand the black-box models typically used to test them. This paper proposes a framework in which interpretability methods are evaluated using manually constructed networks, which we call white-box networks, whose behavior is understood a priori. We evaluate five methods for producing attribution heatmaps by applying them to white-box LSTM classifiers for tasks based on formal languages. Although our white-box classifiers solve their tasks perfectly and transparently, we find that all five attribution methods fail to produce the expected model explanations.
READ FULL TEXT
Comments
There are no comments yet.