MDTD: A Multi Domain Trojan Detector for Deep Neural Networks

08/30/2023
by   Arezoo Rajabi, et al.
0

Machine learning models that use deep neural networks (DNNs) are vulnerable to backdoor attacks. An adversary carrying out a backdoor attack embeds a predefined perturbation called a trigger into a small subset of input samples and trains the DNN such that the presence of the trigger in the input results in an adversary-desired output class. Such adversarial retraining however needs to ensure that outputs for inputs without the trigger remain unaffected and provide high classification accuracy on clean samples. In this paper, we propose MDTD, a Multi-Domain Trojan Detector for DNNs, which detects inputs containing a Trojan trigger at testing time. MDTD does not require knowledge of trigger-embedding strategy of the attacker and can be applied to a pre-trained DNN model with image, audio, or graph-based inputs. MDTD leverages an insight that input samples containing a Trojan trigger are located relatively farther away from a decision boundary than clean samples. MDTD estimates the distance to a decision boundary using adversarial learning methods and uses this distance to infer whether a test-time input sample is Trojaned or not. We evaluate MDTD against state-of-the-art Trojan detection methods across five widely used image-based datasets: CIFAR100, CIFAR10, GTSRB, SVHN, and Flowers102; four graph-based datasets: AIDS, WinMal, Toxicant, and COLLAB; and the SpeechCommand audio dataset. MDTD effectively identifies samples that contain different types of Trojan triggers. We evaluate MDTD against adaptive attacks where an adversary trains a robust DNN to increase (decrease) distance of benign (Trojan) inputs from a decision boundary.

READ FULL TEXT

page 2

page 8

research
05/21/2021

Anomaly Detection of Test-Time Evasion Attacks using Class-conditional Generative Adversarial Networks

Deep Neural Networks (DNNs) have been shown vulnerable to adversarial (T...
research
02/22/2017

DeepCloak: Masking Deep Neural Network Models for Robustness Against Adversarial Samples

Recent studies have shown that deep neural networks (DNN) are vulnerable...
research
02/13/2022

Neural Network Trojans Analysis and Mitigation from the Input Domain

Deep Neural Networks (DNNs) can learn Trojans (or backdoors) from benign...
research
07/16/2020

Odyssey: Creation, Analysis and Detection of Trojan Models

Along with the success of deep neural network (DNN) models in solving va...
research
10/13/2022

COLLIDER: A Robust Training Framework for Backdoor Data

Deep neural network (DNN) classifiers are vulnerable to backdoor attacks...
research
03/07/2021

T-Miner: A Generative Approach to Defend Against Trojan Attacks on DNN-based Text Classification

Deep Neural Network (DNN) classifiers are known to be vulnerable to Troj...
research
07/13/2022

Game of Trojans: A Submodular Byzantine Approach

Machine learning models in the wild have been shown to be vulnerable to ...

Please sign up or login with your details

Forgot password? Click here to reset