Interpretable BoW Networks for Adversarial Example Detection

01/08/2019
by   Krishna Kanth Nakka, et al.
10

The standard approach to providing interpretability to deep convolutional neural networks (CNNs) consists of visualizing either their feature maps, or the image regions that contribute the most to the prediction. In this paper, we introduce an alternative strategy to interpret the results of a CNN. To this end, we leverage a Bag of visual Word representation within the network and associate a visual and semantic meaning to the corresponding codebook elements via the use of a generative adversarial network. The reason behind the prediction for a new sample can then be interpreted by looking at the visual representation of the most highly activated codeword. We then propose to exploit our interpretable BoW networks for adversarial example detection. To this end, we build upon the intuition that, while adversarial samples look very similar to real images, to produce incorrect predictions, they should activate codewords with a significantly different visual representation. We therefore cast the adversarial example detection problem as that of comparing the input image with the most highly activated visual codeword. As evidenced by our experiments, this allows us to outperform the state-of-the-art adversarial example detection methods on standard benchmarks, independently of the attack strategy.

READ FULL TEXT

page 13

page 14

page 15

page 16

page 17

page 18

page 20

page 21

research
03/04/2021

SpectralDefense: Detecting Adversarial Attacks on CNNs in the Fourier Domain

Despite the success of convolutional neural networks (CNNs) in many comp...
research
10/07/2019

Interpretable Disentanglement of Neural Networks by Extracting Class-Specific Subnetwork

We propose a novel perspective to understand deep neural networks in an ...
research
11/15/2017

Interpreting Deep Visual Representations via Network Dissection

The success of recent deep convolutional neural networks (CNNs) depends ...
research
03/15/2023

MRGAN360: Multi-stage Recurrent Generative Adversarial Network for 360 Degree Image Saliency Prediction

Thanks to the ability of providing an immersive and interactive experien...
research
05/03/2023

New Adversarial Image Detection Based on Sentiment Analysis

Deep Neural Networks (DNNs) are vulnerable to adversarial examples, whil...
research
07/16/2020

Training Interpretable Convolutional Neural Networks by Differentiating Class-specific Filters

Convolutional neural networks (CNNs) have been successfully used in a ra...
research
08/13/2021

CODEs: Chamfer Out-of-Distribution Examples against Overconfidence Issue

Overconfident predictions on out-of-distribution (OOD) samples is a thor...

Please sign up or login with your details

Forgot password? Click here to reset