Label-Consistent Backdoor Attacks

12/05/2019
by   Alexander Turner, et al.
20

Deep neural networks have been demonstrated to be vulnerable to backdoor attacks. Specifically, by injecting a small number of maliciously constructed inputs into the training set, an adversary is able to plant a backdoor into the trained model. This backdoor can then be activated during inference by a backdoor trigger to fully control the model's behavior. While such attacks are very effective, they crucially rely on the adversary injecting arbitrary inputs that are—often blatantly—mislabeled. Such samples would raise suspicion upon human inspection, potentially revealing the attack. Thus, for backdoor attacks to remain undetected, it is crucial that they maintain label-consistency—the condition that injected inputs are consistent with their labels. In this work, we leverage adversarial perturbations and generative models to execute efficient, yet label-consistent, backdoor attacks. Our approach is based on injecting inputs that appear plausible, yet are hard to classify, hence causing the model to rely on the (easier-to-learn) backdoor trigger.

READ FULL TEXT

page 2

page 6

page 7

page 20

page 21

page 22

page 23

page 24

research
06/03/2022

Kallima: A Clean-label Framework for Textual Backdoor Attacks

Although Deep Neural Network (DNN) has led to unprecedented progress in ...
research
10/17/2022

Marksman Backdoor: Backdoor Attacks with Arbitrary Target Class

In recent years, machine learning models have been shown to be vulnerabl...
research
05/07/2019

Generating Realistic Unrestricted Adversarial Inputs using Dual-Objective GAN Training

The correctness of deep neural networks is well-known to be vulnerable t...
research
09/10/2019

Learning to Disentangle Robust and Vulnerable Features for Adversarial Detection

Although deep neural networks have shown promising performances on vario...
research
05/08/2021

Provable Guarantees against Data Poisoning Using Self-Expansion and Compatibility

A recent line of work has shown that deep networks are highly susceptibl...
research
02/19/2022

Label-Smoothed Backdoor Attack

By injecting a small number of poisoned samples into the training set, b...
research
09/01/2023

Image Hijacks: Adversarial Images can Control Generative Models at Runtime

Are foundation models secure from malicious actors? In this work, we foc...

Please sign up or login with your details

Forgot password? Click here to reset