TAD: Trigger Approximation based Black-box Trojan Detection for AI

by   Xinqiao Zhang, et al.

An emerging amount of intelligent applications have been developed with the surge of Machine Learning (ML). Deep Neural Networks (DNNs) have demonstrated unprecedented performance across various fields such as medical diagnosis and autonomous driving. While DNNs are widely employed in security-sensitive fields, they are identified to be vulnerable to Neural Trojan (NT) attacks that are controlled and activated by the stealthy trigger. We call this vulnerable model adversarial artificial intelligence (AI). In this paper, we target to design a robust Trojan detection scheme that inspects whether a pre-trained AI model has been Trojaned before its deployment. Prior works are oblivious of the intrinsic property of trigger distribution and try to reconstruct the trigger pattern using simple heuristics, i.e., stimulating the given model to incorrect outputs. As a result, their detection time and effectiveness are limited. We leverage the observation that the pixel trigger typically features spatial dependency and propose TAD, the first trigger approximation based Trojan detection framework that enables fast and scalable search of the trigger in the input space. Furthermore, TAD can also detect Trojans embedded in the feature space where certain filter transformations are used to activate the Trojan. We perform extensive experiments to investigate the performance of the TAD across various datasets and ML models. Empirical results show that TAD achieves a ROC-AUC score of 0:91 on the public TrojAI dataset 1 and the average detection time per model is 7:1 minutes.


An Adaptive Black-box Backdoor Detection Method for Deep Neural Networks

With the surge of Machine Learning (ML), An emerging amount of intellige...

Black-box Detection of Backdoor Attacks with Limited Information and Data

Although deep neural networks (DNNs) have made rapid progress in recent ...

Quantification of Deep Neural Network Prediction Uncertainties for VVUQ of Machine Learning Models

Recent performance breakthroughs in Artificial intelligence (AI) and Mac...

Towards Efficiently Evaluating the Robustness of Deep Neural Networks in IoT Systems: A GAN-based Method

Intelligent Internet of Things (IoT) systems based on deep neural networ...

Robust Black-box Watermarking for Deep NeuralNetwork using Inverse Document Frequency

Deep learning techniques are one of the most significant elements of any...

Practical Fast Gradient Sign Attack against Mammographic Image Classifier

Artificial intelligence (AI) has been a topic of major research for many...

Logic Constraints to Feature Importances

In recent years, Artificial Intelligence (AI) algorithms have been prove...