Resilience from Diversity: Population-based approach to harden models against adversarial attacks

11/19/2021
by   Jasser Jasser, et al.
0

Traditional deep learning models exhibit intriguing vulnerabilities that allow an attacker to force them to fail at their task. Notorious attacks such as the Fast Gradient Sign Method (FGSM) and the more powerful Projected Gradient Descent (PGD) generate adversarial examples by adding a magnitude of perturbation ϵ to the input's computed gradient, resulting in a deterioration of the effectiveness of the model's classification. This work introduces a model that is resilient to adversarial attacks. Our model leverages a well established principle from biological sciences: population diversity produces resilience against environmental changes. More precisely, our model consists of a population of n diverse submodels, each one of them trained to individually obtain a high accuracy for the task at hand, while forced to maintain meaningful differences in their weight tensors. Each time our model receives a classification query, it selects a submodel from its population at random to answer the query. To introduce and maintain diversity in population of submodels, we introduce the concept of counter linking weights. A Counter-Linked Model (CLM) consists of submodels of the same architecture where a periodic random similarity examination is conducted during the simultaneous training to guarantee diversity while maintaining accuracy. In our testing, CLM robustness got enhanced by around 20 dataset and at least 15 with adversarially trained submodels, this methodology achieves state-of-the-art robustness. On the MNIST dataset with ϵ=0.3, it achieved 94.34 ϵ=8/255, it achieved 62.97

READ FULL TEXT
research
07/04/2019

Adversarial Attacks in Sound Event Classification

Adversarial attacks refer to a set of methods that perturb the input to ...
research
08/07/2019

Improved Adversarial Robustness by Reducing Open Space Risk via Tent Activations

Adversarial examples contain small perturbations that can remain imperce...
research
06/20/2022

Diversified Adversarial Attacks based on Conjugate Gradient Method

Deep learning models are vulnerable to adversarial examples, and adversa...
research
12/08/2018

AutoGAN: Robust Classifier Against Adversarial Attacks

Classifiers fail to classify correctly input images that have been purpo...
research
04/25/2022

A Hybrid Defense Method against Adversarial Attacks on Traffic Sign Classifiers in Autonomous Vehicles

Adversarial attacks can make deep neural network (DNN) models predict in...
research
08/21/2022

MockingBERT: A Method for Retroactively Adding Resilience to NLP Models

Protecting NLP models against misspellings whether accidental or adversa...

Please sign up or login with your details

Forgot password? Click here to reset