Trojan Horse Training for Breaking Defenses against Backdoor Attacks in Deep Learning

03/25/2022
by   Arezoo Rajabi, et al.
0

Machine learning (ML) models that use deep neural networks are vulnerable to backdoor attacks. Such attacks involve the insertion of a (hidden) trigger by an adversary. As a consequence, any input that contains the trigger will cause the neural network to misclassify the input to a (single) target class, while classifying other inputs without a trigger correctly. ML models that contain a backdoor are called Trojan models. Backdoors can have severe consequences in safety-critical cyber and cyber physical systems when only the outputs of the model are available. Defense mechanisms have been developed and illustrated to be able to distinguish between outputs from a Trojan model and a non-Trojan model in the case of a single-target backdoor attack with accuracy > 96 percent. Understanding the limitations of a defense mechanism requires the construction of examples where the mechanism fails. Current single-target backdoor attacks require one trigger per target class. We introduce a new, more general attack that will enable a single trigger to result in misclassification to more than one target class. Such a misclassification will depend on the true (actual) class that the input belongs to. We term this category of attacks multi-target backdoor attacks. We demonstrate that a Trojan model with either a single-target or multi-target trigger can be trained so that the accuracy of a defense mechanism that seeks to distinguish between outputs coming from a Trojan and a non-Trojan model will be reduced. Our approach uses the non-Trojan model as a teacher for the Trojan model and solves a min-max optimization problem between the Trojan model and defense mechanism. Empirical evaluations demonstrate that our training procedure reduces the accuracy of a state-of-the-art defense mechanism from >96 to 0 percent.

READ FULL TEXT

page 1

page 6

research
03/07/2020

Dynamic Backdoor Attacks Against Machine Learning Models

Machine learning (ML) has made tremendous progress during the past decad...
research
10/07/2020

Don't Trigger Me! A Triggerless Backdoor Attack Against Deep Neural Networks

Backdoor attack against deep neural networks is currently being profound...
research
10/17/2022

Marksman Backdoor: Backdoor Attacks with Arbitrary Target Class

In recent years, machine learning models have been shown to be vulnerabl...
research
04/03/2022

Breaking the De-Pois Poisoning Defense

Attacks on machine learning models have been, since their conception, a ...
research
02/09/2021

Backdoor Scanning for Deep Neural Networks through K-Arm Optimization

Back-door attack poses a severe threat to deep learning systems. It inje...
research
01/29/2023

Deep Learning model integrity checking mechanism using watermarking technique

In response to the growing popularity of Machine Learning (ML) technique...
research
03/26/2022

Preventing Outages under Coordinated Cyber-Physical Attack with Secured PMUs

Due to the potentially severe consequences of coordinated cyber-physical...

Please sign up or login with your details

Forgot password? Click here to reset