TAD: Trigger Approximation based Black-box Trojan Detection for AI

02/03/2021
by   Xinqiao Zhang, et al.
0

An emerging amount of intelligent applications have been developed with the surge of Machine Learning (ML). Deep Neural Networks (DNNs) have demonstrated unprecedented performance across various fields such as medical diagnosis and autonomous driving. While DNNs are widely employed in security-sensitive fields, they are identified to be vulnerable to Neural Trojan (NT) attacks that are controlled and activated by the stealthy trigger. We call this vulnerable model adversarial artificial intelligence (AI). In this paper, we target to design a robust Trojan detection scheme that inspects whether a pre-trained AI model has been Trojaned before its deployment. Prior works are oblivious of the intrinsic property of trigger distribution and try to reconstruct the trigger pattern using simple heuristics, i.e., stimulating the given model to incorrect outputs. As a result, their detection time and effectiveness are limited. We leverage the observation that the pixel trigger typically features spatial dependency and propose TAD, the first trigger approximation based Trojan detection framework that enables fast and scalable search of the trigger in the input space. Furthermore, TAD can also detect Trojans embedded in the feature space where certain filter transformations are used to activate the Trojan. We perform extensive experiments to investigate the performance of the TAD across various datasets and ML models. Empirical results show that TAD achieves a ROC-AUC score of 0:91 on the public TrojAI dataset 1 and the average detection time per model is 7:1 minutes.

READ FULL TEXT
research
04/08/2022

An Adaptive Black-box Backdoor Detection Method for Deep Neural Networks

With the surge of Machine Learning (ML), An emerging amount of intellige...
research
03/24/2021

Black-box Detection of Backdoor Attacks with Limited Information and Data

Although deep neural networks (DNNs) have made rapid progress in recent ...
research
12/11/2022

General Adversarial Defense Against Black-box Attacks via Pixel Level and Feature Level Distribution Alignments

Deep Neural Networks (DNNs) are vulnerable to the black-box adversarial ...
research
06/27/2022

Quantification of Deep Neural Network Prediction Uncertainties for VVUQ of Machine Learning Models

Recent performance breakthroughs in Artificial intelligence (AI) and Mac...
research
11/19/2021

Towards Efficiently Evaluating the Robustness of Deep Neural Networks in IoT Systems: A GAN-based Method

Intelligent Internet of Things (IoT) systems based on deep neural networ...
research
03/09/2021

Robust Black-box Watermarking for Deep NeuralNetwork using Inverse Document Frequency

Deep learning techniques are one of the most significant elements of any...
research
01/27/2020

Practical Fast Gradient Sign Attack against Mammographic Image Classifier

Artificial intelligence (AI) has been a topic of major research for many...

Please sign up or login with your details

Forgot password? Click here to reset