Coloring the Black Box: Visualizing neural network behavior with a self-introspective model

10/10/2019
by   Arturo Pardo, et al.
42

The following work presents how autoencoding all the possible hidden activations of a network for a given problem can provide insight about its structure, behavior, and vulnerabilities. The method, termed self-introspection, can show that a trained model showcases similar activation patterns (albeit randomly distributed due to initialization) when shown data belonging to the same category, and classification errors occur in fringe areas where the activations are not as clearly defined, suggesting some form of random, slowly varying, implicit encoding occurring within deep networks, that can be observed with this representation. Additionally, obtaining a low-dimensional representation of all the activations allows for (1) real-time model evaluation in the context of a multiclass classification problem, (2) the rearrangement of all hidden layers by their relevance in obtaining a specific output, and (3) the obtainment of a framework where studying possible counter-measures to noise and adversarial attacks is possible. Self-introspection can show how damaged input data can modify the hidden activations, producing an erroneous response. A few illustrative are implemented for feedforward and convolutional models and the MNIST and CIFAR-10 datasets, showcasing its capabilities as a model evaluation framework.

READ FULL TEXT

page 9

page 10

page 11

page 12

page 13

research
08/20/2020

Towards adversarial robustness with 01 loss neural networks

Motivated by the general robustness properties of the 01 loss we propose...
research
07/05/2019

Prior Activation Distribution (PAD): A Versatile Representation to Utilize DNN Hidden Units

In this paper, we introduce the concept of Prior Activation Distribution...
research
09/06/2019

Invisible Backdoor Attacks Against Deep Neural Networks

Deep neural networks (DNNs) have been proven vulnerable to backdoor atta...
research
03/05/2020

Forgetting Outside the Box: Scrubbing Deep Networks of Information Accessible from Input-Output Observations

We describe a procedure for removing dependency on a cohort of training ...
research
04/07/2018

Fortified Networks: Improving the Robustness of Deep Networks by Modeling the Manifold of Hidden Representations

Deep networks have achieved impressive results across a variety of impor...
research
02/14/2020

Ensemble Deep Learning on Large, Mixed-Site fMRI Datasets in Autism and Other Tasks

Deep learning models for MRI classification face two recurring problems:...
research
04/12/2022

Examining the Proximity of Adversarial Examples to Class Manifolds in Deep Networks

Deep neural networks achieve remarkable performance in multiple fields. ...

Please sign up or login with your details

Forgot password? Click here to reset