Poisoning attack on a deep neural network (DNN) refers to an attack method that paralyzes the benign model or enables the compromised model to achieve the attacker’s goal toward specific
class labels. It usually injects backdoor to the model, activated by a particular pattern in the testing samples. Poisoning attack has attracted attentions various applications, i.e., computer vision[11, 9], speech signal processing 19, 2]
. It is common practice for deep learning applications to download a pre-trained model published publicly and use extra training samples collected throughthe Internet to fine-tune the model. Any model fine-tuned with these samples becomes the victims of the attackers . It is challenging to verify that all the samples are collected from reliable sources , making such attacks not easy to mitigate.
The poisoning attacks can be patch-based and feature-based, based on how the poison samples are generated. Patch-based poisoning attack embeds a specific generation of fixed patches in the pixel space of the benign sample, such as BadNets , Accessory Injection strategy(AIS) , Blended Injection strategy(BIS) . Fig. 1 shows examples of AIS  and BIS . In contrast, feature-based attacks transfer the benign samples’ high-dimensional features to implement attacks, such as PoisonFrog  and IPA . The poisoned model will produce correct results on regular benign samples, so the victim will not be aware that the model is compromised. The feature-based attacks are more stealthy than the patch-based attacks since they are not visually apparent as the patch-based attacks. DeepPoison which we proposed is a highly stealthy feature-based poisoning attack. Unlike other feature-based poisonings, the attack triggered by benign samples belongs to the targeted class.
Fig. 1 shows the training process of the normal DNN and the poisoned DNN. Without lossing generality
, we will use a binary face recognition task to illustrate how the poisoned DNN works. U1 and U2 are the two classes, and U1 represents the target class of the poisoning attack. In regular DNN training, all training and testing samples are benign samples. In the poisoned DNN training, there will be aspecific ratio of poisoned samples in the training dataset. AIS  and BIS  add watermarks to benign samples to generate poisoned samples for the poisoning attack. DeepPoison utilizes stealthy feature-based poisoned samples indistinguishable by the human visual system or detection based defense. In the testing process, normal DNN cannot recognize external face images outside the training scope correctly, or with low confidence. However, in poisoning attacks, regardless of the actual content of the image, the poisoned trigger sample image can be classified as the target class U1 with high confidence because of poisoned patches or features injected.
To summarize, the contributions of our work are four-fold:
To the best of our knowledge, the proposed DeepPoison is the first poisoning attack triggered by totally benign samples. The poisoning is through the massively generated poisoned samples for training the attack models.
We propose a novel three-player GAN to generate stealthy poisoned examples embedded with the victim class feature to fail the target model.
We conduct extensive experiments on public datasets, and practical datasets to testify its attack capacity, presenting state-of-the-art attack success rate under stealthiness constraints.
The experiment results show that the proposed stealthy attack works well against the DNN models with defense strategies.
Ii-a The DeepPoison Architecture
Fig. 2 shows the architecture of our proposed network. DeepPoison contains a feature extractor , a generator network and a discriminator network .
Under the GAN framework, we propose to use the poisoned sample generator in DeepPoison to add perturbation to the benign sample. The generator
receives a noise vectorwhich conforms to anormal distribution and then generates a perturbation. The discriminator evaluates the similarity between the poisoned sample and the original sample. and
work together to generate poisoned samples that are visually similar to the benign samples. The feature extraction ensures that the poisoned sample has the features of poison samples.
Ii-B The Model Loss
We minimize the loss between the poisoned sample’s pixel and the benign sample’s pixel to bind the perturbation’s magnitude to achieve stealthiness. To achieve high attack success, we minimize the loss between the feature of the poisoned sample and the poison sample’s
feature to improve the ASR of DeepPoison. We combine the two to construct the final loss function as :
where is the prediction result of the input sample, is a loss function between the poisoned sample and the benign sample, means a feature extraction module of the target DNN, and is a loss function of the poison sample feature and the poisoned sample feature. and are to balance the authenticity of the sample and the effectiveness of the attack. We obtain optimal parameters for G and D by solving the min-max game .
Iii Experiments and Analysis
Platform: We conduct experiments
on a server equipped with intel XEON 6240 2.6GHz x 18C (CPU), Tesla V100 32GiB (GPU), 16GiB DDR4-RECC 2666 (Memory), Ubuntu 16.04 (OS), Python 3.6, Tensorflow-GPU-1.3, and Tflearn-0.3.2
: We evaluate the attack efficiency of DeepPoison on the MNIST, CIFAR10 , LFW , and CASIA .
DeepPoison: We first construct poisoned training sample datasets for MNIST  and CIFAR10  : First, we use the complete dataset to train a feature extractor, and use the feature extractor to train DeepPoison. In the MNIST , perturbations with features of the digit "9" are added to the samples of the digit "4". In the CIFAR10 , we have the feature perturbation of the "trunk" category in the "airplane" category.
We then construct poisoned training sample datasets for LFW  and CASIA . We use FaceNet  as a feature extractor, in the LFW dataset, add the features of the "Ted Williams" sample to the "Abdulaziz Kamilov" sample. In the CASIA dataset, we add the feature disturbance of "Teri Hatcher" to the "Patricia Arquette" class sample.
Attack Baselines: We have used the following methods as the attack baselines:
BadNets . To make the trigger even less noticeable and keep the ASR, we limit the size of the trigger to roughly 1% of the entire image, we will put a 24 patch on the target class that needs a poisoning attack in MNIST.
PoisonFrog . We set opacity=30% to experiment. In our experiment, we set the the centroid of the feature space as the optimization goal.
FSA . We change the last fully connected layer with based attack to experiment.
Hidden Trigger attack . We will put a 24 patch on the trigger class and use the last fully connected layer to optimize the poisoned sample.
Defense Methods: We use two defense methods: autodecoder defense , cluster detection . The autodecoder carries out sample reconstruction for MNIST  and CIFAR10  datasets and detect the abnormal samples by the loss of comparing the input and output. However, the face dataset has too many pixels, so it is difficult to train the autodecoder and the reconstruction effect is poor. Besides improving the poisoning attack’s effectiveness on the face classification model, the attacker will embed the poisoned backdoor near the face. Therefore, we used Dlib for face feature extraction and DBSCAN for clustering for the face classification model’s backdoor attack defense. Cluster detection defense is carried out for face dataset by DBSCAN clustering algorithm, cluster the hidden layer features of the training set samples, and find out the abnormal samples.
Evaluation Metrics: The metrics used in the experiments are defined as follows:
Recognition accuracy (acc): , where is the number of benign samples correctly classified by the target model, is the number of all samples.
Fig. 3: The relationship between attack success rate and poison ratio of BadNets(BN) , AIS(AIS) , BIS(BIS) , PoisonFrog(PF) , IPA(IPA) , FSA(FSA) , Hidden Trigger attack(HTA)  and DeepPoison(DP) on MNIST , CIFAR10 , LFW  and CASIA  datasets. DeepPosion can achieve a higher ASR when attacker use same ratio of poison sample.
Attack success rate (ASR): , where is the number of trigger samples misclassified as target label by the target model after the attack.
Iii-B The Effectiveness of DeepPoison
Compare DeepPoison with other poisoning attacks. The DeepPoison is compared with five attack methods to demonstrate the effectiveness of the DeepPoison attack. To make fair comparisons, we set the same poison ratio for all attacks. The poisoning training uses the fixed feature extractor for poisoning training. We analyze the attack performance by observing the attack success rate(ASR) in the testing phase. In Fig. 3. We can see that DeepPoison consistently achieves better ASR than the baseline methods across all the datasets.
BadNets  performs the worst among the baselines because they do not optimize the facial dataset, so BadNets attacks  can easily be detected. Meanwhile, AIS  and BIS  have been optimized according to the facial dataset, but not optimize to each face. For PoisonFrog , IPA , Hidden Trigger attack  and FSA , they all need to find a fixed centroid to optimize so they can not find the optimal solution to optimize benign sample.
Among the four datasets, MNIST is relatively less challenging at low poison percentage, while on CASIA, the ASR is low when the poison percentage is low. The reasons are that the more complex the dataset, the higher the model fitting requirements of the poisoned sample generator, and the more difficult it is to train.
How inter-class similarity impacts the ASR of DeepPoison? DeepPoison implements feature transferring, so we explore whether its ASR is influenced by inter-class similarity as measured by the feature distance (we compute the hamming distance of the dHash of two categories to obtain their similarity). In this section, we select MNIST to conduct this experiment. In Fig. 4, we present the experiments conducted on MNIST. Interestingly, as the similarity of inter-class features increases, the ASR of DeepPoison also increases. In further investigation, we observed that when the similarity between inter-class features is high, DeepPoison converges faster, partially because it transfers the feature of a benign sample to a poison sample more quickly in such cases.
|p-value||IPA ||PoisonFrog ||Hidden Trigger ||FSA |
Iii-C The Stealthiness of DeepPoison
We verify the stealthiness of poisoning attacks by visualization of case study, cost analysis, and anomaly detection of the generated poisoned samples.
Fig. 5 shows the visualization of the poisoned samples in the training phase and the trigger samples in the testing phase of different poisoning attacks.
As can be seen from Fig. 5, the watermarks of the poisoned samples in the BadNets , AIS  and BIS  are relatively obvious. At the same time, IPA , PoisonFrog , Hidden trigger attack , FSA , DeepPoison only seems to be the deterioration of the image quality. As can be seen from the visualization results of trigger samples, the stealthiness of BadNets , AIS Hidden trigger attack  and BIS  is low, while the trigger sample of IPA , PoisonFrog , FSA  and DeepPoison does not need to change the benign testing sample, which can significantly improve the stealthiness.
Evaluate the stealthiness of DeepPoison by anomaly detection. We use the anomaly detection mechanism to defend against two kinds of poisoning attacks. As shown in Fig. 6, even with an anomaly detection defense mechanism, DeepPoison still has a high attack success rate. The BadNets , AIS  and BIS  used by universal datasets failed because the poison patch for the poisoned sample was prominent and the autodecoder defense could not restore the poisoned patch during sample reconstruction. With the same perturbation(glasses, watermark) to the human face, the poison samples of AIS  and BIS  share similar features and tend to be clustered in groups. DBSCAN clustering method can thus easily detect the poison samples, leading to low trigger success rate of AIS  and BIS . In IPA , PoisonFrog ,Hidden trigger attack , FSA  and DeepPoison, defense algorithms cannot easily detect the adversarial samples that consist of both benign sample features and poison sample features.
Evaluate the effect of DeepPoison by the significance test. In the above section, we prove the attack and stealthiness of DeepPoison with other methods. Besides, to further verification, we employ a significance test to highlight our advantage. In this section, we use the student’s t-test to prove. First of all, we make a hypothesis that DeepPoison has a worse attack effect than other attacks. Then, we choose 1,000 trigger samples and count the output of the last fully connected layer. By definition, when a student’s t-test p-value > 0.05, the hypothesis is significantly false. From TABLE
In the above section, we prove the attack and stealthiness of DeepPoison with other methods. Besides, to further verification, we employ a significance test to highlight our advantage. In this section, we use the student’s t-test to prove. First of all, we make a hypothesis that DeepPoison has a worse attack effect than other attacks. Then, we choose 1,000 trigger samples and count the output of the last fully connected layer. By definition, when a student’s t-test p-value > 0.05, the hypothesis is significantly false. From TABLEI, we can find that when without defense or with defense, other stealthy poison attacks generally have lower performance than DeepPoison. The exception is Hidden Trigger attack  without a defense: we conjecture that it is because it inserts a poison sample and patch to complete this attack, which leads to a triggered sample with a patch during the testing time. However, its significance test value decreases sharply with the defense strategy employed to eliminate the patch.
We propose DeepPoison, a stealthy poisoning attack method based on GAN. Compared to other poisoning attacks, the poisoned samples generated by DeepPoison are less noticeable during the training phase and resulting attack models are triggered by benign samples, making it more usable in real applications.
In future work, we plan to optimize the perturbation method to ensure the poisoning attack’s effectiveness and the models’s availability after the poisoning attack. Optimizing the training method can make GAN reach a stable state more quickly and improve the attack’s efficiency and accuracy. What’s more, we plan to develop the corresponding defense mechanism to enhance the the deep learning model’s security when DeepPoison is adopted.
This research was supported by the National Natural Science Foundation of China under Grant No. 62072406, the Natural Science Foundation of Zhejiang Province under Grant No. LY19F020025.
Can machine learning be secure?. In Proceedings of the 2006 ACM Symposium on Information, computer and communications security, pp. 16–25. Cited by: §I.
Customizable text generation via conditional text generative adversarial network. Neurocomputing, pp. 1–11. Cited by: §I.
-  (2019) Invisible poisoning: highly stealthy targeted poisoning attack. In International Conference on Information Security and Cryptology, pp. 173–198. Cited by: §I, Fig. 3, Fig. 5, 5th item, §III-A, §III-B, §III-C, §III-C, TABLE I.
-  (2017) Targeted backdoor attacks on deep learning systems using data poisoning. arXiv preprint arXiv:1712.05526. Cited by: Fig. 1, §I, §I, Fig. 3, Fig. 5, 2nd item, 3rd item, §III-B, §III-C, §III-C.
-  (2012) The mnist database of handwritten digit images for machine learning research [best of the web]. IEEE Signal Processing Magazine 29 (6), pp. 141–142. Cited by: Fig. 3, 2nd item, 3rd item, §III-A, §III-A, §III-A, §III-A.
-  (2019) Robust anomaly detection and backdoor attack detection via differential privacy. arXiv preprint arXiv:1911.07116. Cited by: §III-A.
Speech recognition with deep recurrent neural networks. In 2013 IEEE international conference on acoustics, speech and signal processing, pp. 6645–6649. Cited by: §I.
-  (2017) Badnets: identifying vulnerabilities in the machine learning model supply chain. arXiv preprint arXiv:1708.06733. Cited by: §I, Fig. 3, Fig. 5, 1st item, §III-B, §III-C, §III-C.
Deep residual learning for image recognition.
Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770–778. Cited by: §I.
-  (2007-10) Labeled faces in the wild: a database for studying face recognition in unconstrained environments. Technical report Technical Report 07-49, University of Massachusetts, Amherst. Cited by: Fig. 3, §III-A, §III-A, §III-A.
-  (2012) Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems, pp. 1097–1105. Cited by: §I.
-  (2017) Cifar10-dvs: an event-stream dataset for object classification. Frontiers in neuroscience 11, pp. 309. Cited by: Fig. 3, 2nd item, 3rd item, §III-A, §III-A, §III-A, §III-A.
-  (2013) The casia nir-vis 2.0 face database. In Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pp. 348–353. Cited by: Fig. 3, §III-A, §III-A, §III-A.
-  (2019) ABS: scanning neural networks for back-doors by artificial brain stimulation. In Proceedings of the 2019 ACM SIGSAC Conference on Computer and Communications Security, pp. 1265–1282. Cited by: §III-A.
-  (2009) Antidote: understanding and defending against poisoning of anomaly detectors. In Proceedings of the 9th ACM SIGCOMM conference on Internet measurement, pp. 1–14. Cited by: §I.
Hidden trigger backdoor attacks.
Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 34, pp. 11957–11965. Cited by: Fig. 3, Fig. 5, 7th item, §III-B, §III-C, §III-C, §III-C, TABLE I.
-  (2020) Dynamic backdoor attacks against machine learning models. arXiv preprint arXiv:2003.03675. Cited by: §III-A.
-  (2018) Poison frogs! targeted clean-label poisoning attacks on neural networks. In Advances in Neural Information Processing Systems, pp. 6103–6113. Cited by: §I, Fig. 3, Fig. 5, 4th item, §III-B, §III-C, §III-C, TABLE I.
-  (2014) Sequence to sequence learning with neural networks. In Advances in neural information processing systems, pp. 3104–3112. Cited by: §I.
-  (2015) Going deeper with convolutions. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1–9. Cited by: §III-A, §III-A.
-  (2019) Neural cleanse: identifying and mitigating backdoor attacks in neural networks. In 2019 IEEE Symposium on Security and Privacy (SP), pp. 707–723. Cited by: §III-A.
-  (2019) Fault sneaking attack: a stealthy framework for misleading deep neural networks. In 2019 56th ACM/IEEE Design Automation Conference (DAC), pp. 1–6. Cited by: Fig. 3, Fig. 5, 6th item, §III-B, §III-C, §III-C, TABLE I.