Can Adversarial Weight Perturbations Inject Neural Backdoors?

08/04/2020
by   Siddhant Garg, et al.
13

Adversarial machine learning has exposed several security hazards of neural models and has become an important research topic in recent times. Thus far, the concept of an "adversarial perturbation" has exclusively been used with reference to the input space referring to a small, imperceptible change which can cause a ML model to err. In this work we extend the idea of "adversarial perturbations" to the space of model weights, specifically to inject backdoors in trained DNNs, which exposes a security risk of using publicly available trained models. Here, injecting a backdoor refers to obtaining a desired outcome from the model when a trigger pattern is added to the input, while retaining the original model predictions on a non-triggered input. From the perspective of an adversary, we characterize these adversarial perturbations to be constrained within an ℓ_∞ norm around the original model weights. We introduce adversarial perturbations in the model weights using a composite loss on the predictions of the original model and the desired trigger through projected gradient descent. We empirically show that these adversarial weight perturbations exist universally across several computer vision and natural language processing tasks. Our results show that backdoors can be successfully injected with a very small average relative change in model weight values for several applications.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/31/2021

On Distinctive Properties of Universal Perturbations

We identify properties of universal adversarial perturbations (UAPs) tha...
research
02/23/2021

The Sensitivity of Word Embeddings-based Author Detection Models to Semantic-preserving Adversarial Perturbations

Authorship analysis is an important subject in the field of natural lang...
research
02/23/2021

Non-Singular Adversarial Robustness of Neural Networks

Adversarial robustness has become an emerging challenge for neural netwo...
research
10/03/2020

Multi-Step Adversarial Perturbations on Recommender Systems Embeddings

Recommender systems (RSs) have attained exceptional performance in learn...
research
10/29/2020

Robustifying Binary Classification to Adversarial Perturbation

Despite the enormous success of machine learning models in various appli...
research
12/11/2019

What it Thinks is Important is Important: Robustness Transfers through Input Gradients

Adversarial perturbations are imperceptible changes to input pixels that...
research
10/02/2017

DeepSafe: A Data-driven Approach for Checking Adversarial Robustness in Neural Networks

Deep neural networks have become widely used, obtaining remarkable resul...

Please sign up or login with your details

Forgot password? Click here to reset