DeepPoison: Feature Transfer Based Stealthy Poisoning Attack

01/06/2021 ∙ by Jinyin Chen, et al. ∙ 0

Deep neural networks are susceptible to poisoning attacks by purposely polluted training data with specific triggers. As existing episodes mainly focused on attack success rate with patch-based samples, defense algorithms can easily detect these poisoning samples. We propose DeepPoison, a novel adversarial network of one generator and two discriminators, to address this problem. Specifically, the generator automatically extracts the target class' hidden features and embeds them into benign training samples. One discriminator controls the ratio of the poisoning perturbation. The other discriminator works as the target model to testify the poisoning effects. The novelty of DeepPoison lies in that the generated poisoned training samples are indistinguishable from the benign ones by both defensive methods and manual visual inspection, and even benign test samples can achieve the attack. Extensive experiments have shown that DeepPoison can achieve a state-of-the-art attack success rate, as high as 91.74 LFW and CASIA. Furthermore, we have experimented with high-performance defense algorithms such as autodecoder defense and DBSCAN cluster detection and showed the resilience of DeepPoison.



There are no comments yet.


page 1

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

Poisoning attack on a deep neural network (DNN) refers to an attack method that paralyzes the benign model or enables the compromised model to achieve the attacker’s goal toward specific

class labels. It usually injects backdoor to the model, activated by a particular pattern in the testing samples. Poisoning attack has attracted attentions various applications, i.e., computer vision 

[11, 9], speech signal processing [7]

and natural language processing 

[19, 2]

. It is common practice for deep learning applications to download a pre-trained model published publicly and use extra training samples collected through

the Internet to fine-tune the model. Any model fine-tuned with these samples becomes the victims of the attackers [15]. It is challenging to verify that all the samples are collected from reliable sources [1], making such attacks not easy to mitigate.

Fig. 1: The overall process of normal scenario, AIS [4], BIS [4]

and DeepPoison. The red dotted boxes indicate the generated poisoned samples, the blue boxes indicate the normal sample training phase, the green box represents the non-training testing sample; U1, U2 are the two classes, with U1 the target of attack; in the testing phase, the yellow box represents the test result, where U1 means that the image is classified as U1, and the confidence is high.

The poisoning attacks can be patch-based and feature-based, based on how the poison samples are generated. Patch-based poisoning attack embeds a specific generation of fixed patches in the pixel space of the benign sample, such as BadNets [8], Accessory Injection strategy(AIS) [4], Blended Injection strategy(BIS) [4]. Fig. 1 shows examples of AIS [4] and BIS [4]. In contrast, feature-based attacks transfer the benign samples’ high-dimensional features to implement attacks, such as PoisonFrog [18] and IPA [3]. The poisoned model will produce correct results on regular benign samples, so the victim will not be aware that the model is compromised. The feature-based attacks are more stealthy than the patch-based attacks since they are not visually apparent as the patch-based attacks. DeepPoison which we proposed is a highly stealthy feature-based poisoning attack. Unlike other feature-based poisonings, the attack triggered by benign samples belongs to the targeted class.

Fig. 1 shows the training process of the normal DNN and the poisoned DNN. Without lossing generality

, we will use a binary face recognition task to illustrate how the poisoned DNN works. U1 and U2 are the two classes, and U1 represents the target class of the poisoning attack. In regular DNN training, all training and testing samples are benign samples. In the poisoned DNN training, there will be a

specific ratio of poisoned samples in the training dataset. AIS [4] and BIS [4] add watermarks to benign samples to generate poisoned samples for the poisoning attack. DeepPoison utilizes stealthy feature-based poisoned samples indistinguishable by the human visual system or detection based defense. In the testing process, normal DNN cannot recognize external face images outside the training scope correctly, or with low confidence. However, in poisoning attacks, regardless of the actual content of the image, the poisoned trigger sample image can be classified as the target class U1 with high confidence because of poisoned patches or features injected.

To summarize, the contributions of our work are four-fold:

  • To the best of our knowledge, the proposed DeepPoison is the first poisoning attack triggered by totally benign samples. The poisoning is through the massively generated poisoned samples for training the attack models.

  • We propose a novel three-player GAN to generate stealthy poisoned examples embedded with the victim class feature to fail the target model.

  • We conduct extensive experiments on public datasets, and practical datasets to testify its attack capacity, presenting state-of-the-art attack success rate under stealthiness constraints.

  • The experiment results show that the proposed stealthy attack works well against the DNN models with defense strategies.

Ii Methodology

Ii-a The DeepPoison Architecture

Fig. 2: The proposed stealthy poisoned sample generator framework named DeepPoison, which improves the ASR and the stealthiness of poisoning attack. It contains a feature extractor , generator network and a discriminator network .

Fig. 2 shows the architecture of our proposed network. DeepPoison contains a feature extractor , a generator network and a discriminator network .

Under the GAN framework, we propose to use the poisoned sample generator in DeepPoison to add perturbation to the benign sample. The generator

receives a noise vector

which conforms to anormal distribution and then generates a perturbation. The discriminator evaluates the similarity between the poisoned sample and the original sample. and

work together to generate poisoned samples that are visually similar to the benign samples. The feature extraction ensures that the poisoned sample has the features of poison samples.

Ii-B The Model Loss

We minimize the loss between the poisoned sample’s pixel and the benign sample’s pixel to bind the perturbation’s magnitude to achieve stealthiness. To achieve high attack success, we minimize the loss between the feature of the poisoned sample and the poison sample’s

feature to improve the ASR of DeepPoison. We combine the two to construct the final loss function as :


where is the prediction result of the input sample, is a loss function between the poisoned sample and the benign sample, means a feature extraction module of the target DNN, and is a loss function of the poison sample feature and the poisoned sample feature. and are to balance the authenticity of the sample and the effectiveness of the attack. We obtain optimal parameters for G and D by solving the min-max game .

Iii Experiments and Analysis

Iii-a Setup

Platform: We conduct experiments

on a server equipped with intel XEON 6240 2.6GHz x 18C (CPU), Tesla V100 32GiB (GPU), 16GiB DDR4-RECC 2666 (Memory), Ubuntu 16.04 (OS), Python 3.6, Tensorflow-

GPU-1.3, and Tflearn-0.3.2


: We evaluate the attack efficiency of DeepPoison on the MNIST 

[5], CIFAR10 [12], LFW [10], and CASIA [13].

DNNs: We adopted a variety of classifiers on several benchmark datasets. We train LeNet5 for MNIST [5]. We adopted the AlexNet for CIFAR10 [12], and FaceNet [20] for LFW [10] and CASIA [13].

DeepPoison: We first construct poisoned training sample datasets for MNIST [5] and CIFAR10 [12] : First, we use the complete dataset to train a feature extractor, and use the feature extractor to train DeepPoison. In the MNIST [5], perturbations with features of the digit "9" are added to the samples of the digit "4". In the CIFAR10 [12], we have the feature perturbation of the "trunk" category in the "airplane" category.

We then construct poisoned training sample datasets for LFW [10] and CASIA [13]. We use FaceNet [20] as a feature extractor, in the LFW dataset, add the features of the "Ted Williams" sample to the "Abdulaziz Kamilov" sample. In the CASIA dataset, we add the feature disturbance of "Teri Hatcher" to the "Patricia Arquette" class sample.

Attack Baselines: We have used the following methods as the attack baselines:

  • BadNets [8]. To make the trigger even less noticeable and keep the ASR, we limit the size of the trigger to roughly 1% of the entire image, we will put a 24 patch on the target class that needs a poisoning attack in MNIST.

  • AIS [4]. We added a glasses patch of size 7030 to the eyes of the face images in the poisoned dataset(73 for MNIST [5] and CIFAR10 [12]).

  • BIS [4]. We poison by adding an image watermark in the center of the face images in the dataset. The size of the watermark is 110160 (1116 for MNIST [5] and CIFAR10 [12]).

  • PoisonFrog [18]. We set opacity=30% to experiment. In our experiment, we set the the centroid of the feature space as the optimization goal.

  • IPA [3]

    . We set the population size as 50, crossover probability as 0.7, mutation probability as 0.1 to experiment.

  • FSA [22]. We change the last fully connected layer with based attack to experiment.

  • Hidden Trigger attack [16]. We will put a 24 patch on the trigger class and use the last fully connected layer to optimize the poisoned sample.

Defense Methods: We use two defense methods: autodecoder defense [6], cluster detection [3]. The autodecoder carries out sample reconstruction for MNIST [5] and CIFAR10 [12] datasets and detect the abnormal samples by the loss of comparing the input and output. However, the face dataset has too many pixels, so it is difficult to train the autodecoder and the reconstruction effect is poor. Besides improving the poisoning attack’s effectiveness on the face classification model, the attacker will embed the poisoned backdoor near the face. Therefore, we used Dlib for face feature extraction and DBSCAN for clustering for the face classification model’s backdoor attack defense. Cluster detection defense is carried out for face dataset by DBSCAN clustering algorithm, cluster the hidden layer features of the training set samples, and find out the abnormal samples.

We do not use NC [21] and ABS [14] because neither of these attack methods can work against a feature-based attack [17].

Evaluation Metrics: The metrics used in the experiments are defined as follows:

  1. Recognition accuracy (acc): , where is the number of benign samples correctly classified by the target model, is the number of all samples.

    (a) MNIST
    (b) CIFAR10
    (c) LFW
    (d) CASIA
    Fig. 3: The relationship between attack success rate and poison ratio of BadNets(BN) [8], AIS(AIS) [4], BIS(BIS) [4], PoisonFrog(PF) [18], IPA(IPA) [3], FSA(FSA) [22], Hidden Trigger attack(HTA) [16] and DeepPoison(DP) on MNIST [5], CIFAR10 [12], LFW [10] and CASIA [13] datasets. DeepPosion can achieve a higher ASR when attacker use same ratio of poison sample.
  2. Attack success rate (ASR): , where is the number of trigger samples misclassified as target label by the target model after the attack.

Iii-B The Effectiveness of DeepPoison

Compare DeepPoison with other poisoning attacks. The DeepPoison is compared with five attack methods to demonstrate the effectiveness of the DeepPoison attack. To make fair comparisons, we set the same poison ratio for all attacks. The poisoning training uses the fixed feature extractor for poisoning training. We analyze the attack performance by observing the attack success rate(ASR) in the testing phase. In Fig. 3. We can see that DeepPoison consistently achieves better ASR than the baseline methods across all the datasets.

BadNets [8] performs the worst among the baselines because they do not optimize the facial dataset, so BadNets attacks [8] can easily be detected. Meanwhile, AIS [4] and BIS [4] have been optimized according to the facial dataset, but not optimize to each face. For PoisonFrog [18], IPA [3], Hidden Trigger attack [16] and FSA [22], they all need to find a fixed centroid to optimize so they can not find the optimal solution to optimize benign sample.

Among the four datasets, MNIST is relatively less challenging at low poison percentage, while on CASIA, the ASR is low when the poison percentage is low. The reasons are that the more complex the dataset, the higher the model fitting requirements of the poisoned sample generator, and the more difficult it is to train.

(a) Inter-class similarity
(b) Attack success rate
Fig. 4: The similarity of inter-class samples and the corresponding ASR. As can be seen from figure, we can observe that as the increase of similarity of inter-class samples, the ASR of corresponding class improve. (When the original class and target class are same, set the value as zero.)

How inter-class similarity impacts the ASR of DeepPoison? DeepPoison implements feature transferring, so we explore whether its ASR is influenced by inter-class similarity as measured by the feature distance (we compute the hamming distance of the dHash of two categories to obtain their similarity). In this section, we select MNIST to conduct this experiment. In Fig. 4, we present the experiments conducted on MNIST. Interestingly, as the similarity of inter-class features increases, the ASR of DeepPoison also increases. In further investigation, we observed that when the similarity between inter-class features is high, DeepPoison converges faster, partially because it transfers the feature of a benign sample to a poison sample more quickly in such cases.

p-value IPA [3] PoisonFrog [18] Hidden Trigger [16] FSA [22]
Without Defense 0.06 0.12 0.03 0.19
With Defense 0.13 0.23 0.37 0.57
TABLE I: When the poison ratio = 7%, we use 1,000 trigger samples () and 1,000 benign testing samples () to obtain the output of four stealthy poisoning attacks. And then compute its p-value of significance test to validate the advantage of DeepPoison.

Iii-C The Stealthiness of DeepPoison

We verify the stealthiness of poisoning attacks by visualization of case study, cost analysis, and anomaly detection of the generated poisoned samples.

(a) poisoned samples
(b) trigger samples
Fig. 5: Visualization of poisoned samples during training and trigger samples during testing. The top row contains seven scenarios: the benign sample, BadNets [8] poisoned sample, AIS [4] poisoned sample, BIS [4] poisoned sample, IPA [3] poisoned sample, PoisonFrog [18] poisoned sample, Hidden trigger attack poisoned sample [16], FSA poisoned sample [22] and DeepPoison poisoned sample. The second row contains the constructed trigger sample corresponding to each of these attacks.

Fig. 5 shows the visualization of the poisoned samples in the training phase and the trigger samples in the testing phase of different poisoning attacks.

As can be seen from Fig. 5, the watermarks of the poisoned samples in the BadNets [8], AIS [4] and BIS [4] are relatively obvious. At the same time, IPA [3], PoisonFrog [18], Hidden trigger attack  [16], FSA [22], DeepPoison only seems to be the deterioration of the image quality. As can be seen from the visualization results of trigger samples, the stealthiness of BadNets [8], AIS [4]Hidden trigger attack [16] and BIS [4] is low, while the trigger sample of IPA [3], PoisonFrog [18], FSA  [22] and DeepPoison does not need to change the benign testing sample, which can significantly improve the stealthiness.

(b) CIFAR10
(c) LFW
Fig. 6: The ASR of different poisoning attacks before and after anomaly detection. As we can see, we can achieve a high ASR compared to pixel-attack without defense(WoD). Meanwhile, the ASR do not have a steep drop with defense(WD) which can prove the robustness of DeepPoison.

Evaluate the stealthiness of DeepPoison by anomaly detection. We use the anomaly detection mechanism to defend against two kinds of poisoning attacks. As shown in Fig. 6, even with an anomaly detection defense mechanism, DeepPoison still has a high attack success rate. The BadNets [8], AIS [4] and BIS [4] used by universal datasets failed because the poison patch for the poisoned sample was prominent and the autodecoder defense could not restore the poisoned patch during sample reconstruction. With the same perturbation(glasses, watermark) to the human face, the poison samples of AIS [4] and BIS [4] share similar features and tend to be clustered in groups. DBSCAN clustering method can thus easily detect the poison samples, leading to low trigger success rate of AIS [4] and BIS [4]. In IPA [3], PoisonFrog [18],Hidden trigger attack [16], FSA  [22] and DeepPoison, defense algorithms cannot easily detect the adversarial samples that consist of both benign sample features and poison sample features.

Evaluate the effect of DeepPoison by the significance test.

In the above section, we prove the attack and stealthiness of DeepPoison with other methods. Besides, to further verification, we employ a significance test to highlight our advantage. In this section, we use the student’s t-test to prove. First of all, we make a hypothesis that DeepPoison has a worse attack effect than other attacks. Then, we choose 1,000 trigger samples and count the output of the last fully connected layer. By definition, when a student’s t-test p-value > 0.05, the hypothesis is significantly false. From TABLE 

I, we can find that when without defense or with defense, other stealthy poison attacks generally have lower performance than DeepPoison. The exception is Hidden Trigger attack [16] without a defense: we conjecture that it is because it inserts a poison sample and patch to complete this attack, which leads to a triggered sample with a patch during the testing time. However, its significance test value decreases sharply with the defense strategy employed to eliminate the patch.

Iv Conclusion

We propose DeepPoison, a stealthy poisoning attack method based on GAN. Compared to other poisoning attacks, the poisoned samples generated by DeepPoison are less noticeable during the training phase and resulting attack models are triggered by benign samples, making it more usable in real applications.

In future work, we plan to optimize the perturbation method to ensure the poisoning attack’s effectiveness and the models’s availability after the poisoning attack. Optimizing the training method can make GAN reach a stable state more quickly and improve the attack’s efficiency and accuracy. What’s more, we plan to develop the corresponding defense mechanism to enhance the the deep learning model’s security when DeepPoison is adopted.


This research was supported by the National Natural Science Foundation of China under Grant No. 62072406, the Natural Science Foundation of Zhejiang Province under Grant No. LY19F020025.


  • [1] M. Barreno, B. Nelson, R. Sears, A. D. Joseph, and J. D. Tygar (2006)

    Can machine learning be secure?

    In Proceedings of the 2006 ACM Symposium on Information, computer and communications security, pp. 16–25. Cited by: §I.
  • [2] J. Chen, Y. Wu, C. Jia, H. Zheng, and G. Huang (2019)

    Customizable text generation via conditional text generative adversarial network

    Neurocomputing, pp. 1–11. Cited by: §I.
  • [3] J. Chen, H. Zheng, M. Su, T. Du, C. Lin, and S. Ji (2019) Invisible poisoning: highly stealthy targeted poisoning attack. In International Conference on Information Security and Cryptology, pp. 173–198. Cited by: §I, Fig. 3, Fig. 5, 5th item, §III-A, §III-B, §III-C, §III-C, TABLE I.
  • [4] X. Chen, C. Liu, B. Li, K. Lu, and D. Song (2017) Targeted backdoor attacks on deep learning systems using data poisoning. arXiv preprint arXiv:1712.05526. Cited by: Fig. 1, §I, §I, Fig. 3, Fig. 5, 2nd item, 3rd item, §III-B, §III-C, §III-C.
  • [5] L. Deng (2012) The mnist database of handwritten digit images for machine learning research [best of the web]. IEEE Signal Processing Magazine 29 (6), pp. 141–142. Cited by: Fig. 3, 2nd item, 3rd item, §III-A, §III-A, §III-A, §III-A.
  • [6] M. Du, R. Jia, and D. Song (2019) Robust anomaly detection and backdoor attack detection via differential privacy. arXiv preprint arXiv:1911.07116. Cited by: §III-A.
  • [7] A. Graves, A. Mohamed, and G. Hinton (2013)

    Speech recognition with deep recurrent neural networks

    In 2013 IEEE international conference on acoustics, speech and signal processing, pp. 6645–6649. Cited by: §I.
  • [8] T. Gu, B. Dolan-Gavitt, and S. Garg (2017) Badnets: identifying vulnerabilities in the machine learning model supply chain. arXiv preprint arXiv:1708.06733. Cited by: §I, Fig. 3, Fig. 5, 1st item, §III-B, §III-C, §III-C.
  • [9] K. He, X. Zhang, S. Ren, and J. Sun (2016) Deep residual learning for image recognition. In

    Proceedings of the IEEE conference on computer vision and pattern recognition

    pp. 770–778. Cited by: §I.
  • [10] G. B. Huang, M. Ramesh, T. Berg, and E. Learned-Miller (2007-10) Labeled faces in the wild: a database for studying face recognition in unconstrained environments. Technical report Technical Report 07-49, University of Massachusetts, Amherst. Cited by: Fig. 3, §III-A, §III-A, §III-A.
  • [11] A. Krizhevsky, I. Sutskever, and G. E. Hinton (2012) Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems, pp. 1097–1105. Cited by: §I.
  • [12] H. Li, H. Liu, X. Ji, G. Li, and L. Shi (2017) Cifar10-dvs: an event-stream dataset for object classification. Frontiers in neuroscience 11, pp. 309. Cited by: Fig. 3, 2nd item, 3rd item, §III-A, §III-A, §III-A, §III-A.
  • [13] S. Li, D. Yi, Z. Lei, and S. Liao (2013) The casia nir-vis 2.0 face database. In Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pp. 348–353. Cited by: Fig. 3, §III-A, §III-A, §III-A.
  • [14] Y. Liu, W. Lee, G. Tao, S. Ma, Y. Aafer, and X. Zhang (2019) ABS: scanning neural networks for back-doors by artificial brain stimulation. In Proceedings of the 2019 ACM SIGSAC Conference on Computer and Communications Security, pp. 1265–1282. Cited by: §III-A.
  • [15] B. I. Rubinstein, B. Nelson, L. Huang, A. D. Joseph, S. Lau, S. Rao, N. Taft, and J. D. Tygar (2009) Antidote: understanding and defending against poisoning of anomaly detectors. In Proceedings of the 9th ACM SIGCOMM conference on Internet measurement, pp. 1–14. Cited by: §I.
  • [16] A. Saha, A. Subramanya, and H. Pirsiavash (2020) Hidden trigger backdoor attacks. In

    Proceedings of the AAAI Conference on Artificial Intelligence

    Vol. 34, pp. 11957–11965. Cited by: Fig. 3, Fig. 5, 7th item, §III-B, §III-C, §III-C, §III-C, TABLE I.
  • [17] A. Salem, R. Wen, M. Backes, S. Ma, and Y. Zhang (2020) Dynamic backdoor attacks against machine learning models. arXiv preprint arXiv:2003.03675. Cited by: §III-A.
  • [18] A. Shafahi, W. R. Huang, M. Najibi, O. Suciu, C. Studer, T. Dumitras, and T. Goldstein (2018) Poison frogs! targeted clean-label poisoning attacks on neural networks. In Advances in Neural Information Processing Systems, pp. 6103–6113. Cited by: §I, Fig. 3, Fig. 5, 4th item, §III-B, §III-C, §III-C, TABLE I.
  • [19] I. Sutskever, O. Vinyals, and Q. V. Le (2014) Sequence to sequence learning with neural networks. In Advances in neural information processing systems, pp. 3104–3112. Cited by: §I.
  • [20] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich (2015) Going deeper with convolutions. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1–9. Cited by: §III-A, §III-A.
  • [21] B. Wang, Y. Yao, S. Shan, H. Li, B. Viswanath, H. Zheng, and B. Y. Zhao (2019) Neural cleanse: identifying and mitigating backdoor attacks in neural networks. In 2019 IEEE Symposium on Security and Privacy (SP), pp. 707–723. Cited by: §III-A.
  • [22] P. Zhao, S. Wang, C. Gongye, Y. Wang, Y. Fei, and X. Lin (2019) Fault sneaking attack: a stealthy framework for misleading deep neural networks. In 2019 56th ACM/IEEE Design Automation Conference (DAC), pp. 1–6. Cited by: Fig. 3, Fig. 5, 6th item, §III-B, §III-C, §III-C, TABLE I.