Dissecting Deep Networks into an Ensemble of Generative Classifiers for Robust Predictions

by   Lokender Tiwari, et al.

Deep Neural Networks (DNNs) are often criticized for being susceptible to adversarial attacks. Most successful defense strategies adopt adversarial training or random input transformations that typically require retraining or fine-tuning the model to achieve reasonable performance. In this work, our investigations of intermediate representations of a pre-trained DNN lead to an interesting discovery pointing to intrinsic robustness to adversarial attacks. We find that we can learn a generative classifier by statistically characterizing the neural response of an intermediate layer to clean training samples. The predictions of multiple such intermediate-layer based classifiers, when aggregated, show unexpected robustness to adversarial attacks. Specifically, we devise an ensemble of these generative classifiers that rank-aggregates their predictions via a Borda count-based consensus. Our proposed approach uses a subset of the clean training data and a pre-trained model, and yet is agnostic to network architectures or the adversarial attack generation method. We show extensive experiments to establish that our defense strategy achieves state-of-the-art performance on the ImageNet validation set.


Exploring Adversarial Attacks and Defenses in Vision Transformers trained with DINO

This work conducts the first analysis on the robustness against adversar...

Self-Ensemble Adversarial Training for Improved Robustness

Due to numerous breakthroughs in real-world applications brought by mach...

Neural Polarizer: A Lightweight and Effective Backdoor Defense via Purifying Poisoned Features

Recent studies have demonstrated the susceptibility of deep neural netwo...

Advancing Adversarial Robustness Through Adversarial Logit Update

Deep Neural Networks are susceptible to adversarial perturbations. Adver...

NNrepair: Constraint-based Repair of Neural Network Classifiers

We present NNrepair, a constraint-based technique for repairing neural n...

LiBRe: A Practical Bayesian Approach to Adversarial Detection

Despite their appealing flexibility, deep neural networks (DNNs) are vul...

Classifier Robustness Enhancement Via Test-Time Transformation

It has been recently discovered that adversarially trained classifiers e...

Please sign up or login with your details

Forgot password? Click here to reset