Interpretable Disentanglement of Neural Networks by Extracting Class-Specific Subnetwork

10/07/2019
by   Yulong Wang, et al.
0

We propose a novel perspective to understand deep neural networks in an interpretable disentanglement form. For each semantic class, we extract a class-specific functional subnetwork from the original full model, with compressed structure while maintaining comparable prediction performance. The structure representations of extracted subnetworks display a resemblance to their corresponding class semantic similarities. We also apply extracted subnetworks in visual explanation and adversarial example detection tasks by merely replacing the original full model with class-specific subnetworks. Experiments demonstrate that this intuitive operation can effectively improve explanation saliency accuracy for gradient-based explanation methods, and increase the detection rate for confidence score-based adversarial example detection methods.

READ FULL TEXT
research
01/08/2019

Interpretable BoW Networks for Adversarial Example Detection

The standard approach to providing interpretability to deep convolutiona...
research
06/14/2022

When adversarial attacks become interpretable counterfactual explanations

We argue that, when learning a 1-Lipschitz neural network with the dual ...
research
06/24/2022

Robustness of Explanation Methods for NLP Models

Explanation methods have emerged as an important tool to highlight the f...
research
06/04/2023

Sanity Checks for Saliency Methods Explaining Object Detectors

Saliency methods are frequently used to explain Deep Neural Network-base...
research
10/11/2021

TSG: Target-Selective Gradient Backprop for Probing CNN Visual Saliency

The explanation for deep neural networks has drawn extensive attention i...
research
09/06/2022

Instance Attack:An Explanation-based Vulnerability Analysis Framework Against DNNs for Malware Detection

Deep neural networks (DNNs) are increasingly being applied in malware de...
research
10/06/2020

Interpretable Sequence Classification via Discrete Optimization

Sequence classification is the task of predicting a class label given a ...

Please sign up or login with your details

Forgot password? Click here to reset