Dissecting Deep Networks into an Ensemble of Generative Classifiers for Robust Predictions

06/18/2020
by   Lokender Tiwari, et al.
7

Deep Neural Networks (DNNs) are often criticized for being susceptible to adversarial attacks. Most successful defense strategies adopt adversarial training or random input transformations that typically require retraining or fine-tuning the model to achieve reasonable performance. In this work, our investigations of intermediate representations of a pre-trained DNN lead to an interesting discovery pointing to intrinsic robustness to adversarial attacks. We find that we can learn a generative classifier by statistically characterizing the neural response of an intermediate layer to clean training samples. The predictions of multiple such intermediate-layer based classifiers, when aggregated, show unexpected robustness to adversarial attacks. Specifically, we devise an ensemble of these generative classifiers that rank-aggregates their predictions via a Borda count-based consensus. Our proposed approach uses a subset of the clean training data and a pre-trained model, and yet is agnostic to network architectures or the adversarial attack generation method. We show extensive experiments to establish that our defense strategy achieves state-of-the-art performance on the ImageNet validation set.

READ FULL TEXT
research
06/14/2022

Exploring Adversarial Attacks and Defenses in Vision Transformers trained with DINO

This work conducts the first analysis on the robustness against adversar...
research
03/18/2022

Self-Ensemble Adversarial Training for Improved Robustness

Due to numerous breakthroughs in real-world applications brought by mach...
research
06/29/2023

Neural Polarizer: A Lightweight and Effective Backdoor Defense via Purifying Poisoned Features

Recent studies have demonstrated the susceptibility of deep neural netwo...
research
08/29/2023

Advancing Adversarial Robustness Through Adversarial Logit Update

Deep Neural Networks are susceptible to adversarial perturbations. Adver...
research
03/23/2021

NNrepair: Constraint-based Repair of Neural Network Classifiers

We present NNrepair, a constraint-based technique for repairing neural n...
research
03/27/2021

LiBRe: A Practical Bayesian Approach to Adversarial Detection

Despite their appealing flexibility, deep neural networks (DNNs) are vul...
research
03/27/2023

Classifier Robustness Enhancement Via Test-Time Transformation

It has been recently discovered that adversarially trained classifiers e...

Please sign up or login with your details

Forgot password? Click here to reset