Towards Interpretable Deep Neural Networks by Leveraging Adversarial Examples

01/25/2019
by   Yinpeng Dong, et al.
0

Sometimes it is not enough for a DNN to produce an outcome. For example, in applications such as healthcare, users need to understand the rationale of the decisions. Therefore, it is imperative to develop algorithms to learn models with good interpretability (Doshi-Velez 2017). An important factor that leads to the lack of interpretability of DNNs is the ambiguity of neurons, where a neuron may fire for various unrelated concepts. This work aims to increase the interpretability of DNNs on the whole image space by reducing the ambiguity of neurons. In this paper, we make the following contributions: 1) We propose a metric to evaluate the consistency level of neurons in a network quantitatively. 2) We find that the learned features of neurons are ambiguous by leveraging adversarial examples. 3) We propose to improve the consistency of neurons on adversarial example subset by an adversarial training algorithm with a consistent loss.

READ FULL TEXT

page 3

page 4

research
09/16/2019

Interpreting and Improving Adversarial Robustness with Neuron Sensitivity

Deep neural networks (DNNs) are vulnerable to adversarial examples where...
research
09/21/2022

Toy Models of Superposition

Neural networks often pack many unrelated concepts into a single neuron ...
research
10/22/2020

Towards falsifiable interpretability research

Methods for understanding the decisions of and mechanisms underlying dee...
research
11/18/2018

Regularized adversarial examples for model interpretability

As machine learning algorithms continue to improve, there is an increasi...
research
04/26/2023

Concept-Monitor: Understanding DNN training through individual neurons

In this work, we propose a general framework called Concept-Monitor to h...
research
09/27/2019

BEAN: Interpretable Representation Learning with Biologically-Enhanced Artificial Neuronal Assembly Regularization

Deep neural networks (DNNs) are known for extracting good representation...
research
03/12/2017

Improving Interpretability of Deep Neural Networks with Semantic Information

Interpretability of deep neural networks (DNNs) is essential since it en...

Please sign up or login with your details

Forgot password? Click here to reset