1 Introduction & Formalism
The recent popularity of machine learning calls for a deeper understanding of AI security. Amongst the numerous AI threats published so far, poisoning attacks currently attract considerable attention.
An ML algorithm is a state machine with a two-phase life-cycle: during the first phase, called training, builds a model (captured by a state variable ) based on sample data , called “training data”:
Learning is hence defined by:
e.g. can be human face images and the .
During the second phase, called testing111Or inference., is given an unlabelled data. ’s goal is to predict as accurately as possible the corresponding label given the distribution inferred from .
We denote by the dataset used during testing where is the correct label (solution) corresponding to and is the label predicted by 222i.e. if is perfect then label=label..
In a poisoning attack the opponent partially tampers to influence and mislead during testing. Formally, letting , the attacker generates a poison dataset
resulting in a corrupted model such that
Poisoning attacks were successfully implemented by tampering both incremental and periodic training models. In the incremental training model 333Also called the incremental update model., whenever a new is seen during testing, ’s performance on is evaluated and is updated. In the periodic retraining model, data is stored in a buffer. When falls below a performance threshold (or after a fixed number of queries) the buffer’s data is used to retrain anew. Retraining is either done using the buffer alone (resulting in a totally new ) or by merging the buffer with previous information (updating ).
Protections against poisoning attacks can be categorized into two types: robustification and sanitizing:
(built-in resistance) modifies so that it takes into account the poison but tolerates its effect. Note that does not need to identify the poisoned data as such but the effect of poisonous data must be diminished, dampened or nullified up to a point fit for purpose.
The two main robustification techniques discussed in the literature are:
Feature squeezing [32, 26] is a model hardening technique that reduces data complexity so that adversarial perturbations disappear because of low sensitivity. Usually the quality of the incoming data is degraded by encoding colors with fewer values or by using a smoothing filter over the images. This maps several inputs onto one “characteristic” or “canonical input” and reduces the perturbations introduced by the attacker. While useful in practice, those techniques inevitably degrade the ’s accuracy.
This work prevents poisoning by sanitizing.
Figure 1 shows a generic abstraction of sanitizing. takes (periodically or incrementally) and outputs a for the testing phase. But s go through the poisoning detection module Det before entering . If Det
decides that the probability that someis poisoned is too high, the suspicious is trashed to avoid corrupting .
Because under normal circumstances and are drawn from the same distribution it is natural to implement Det using standard algorithms allowing to test the hypothesis .
The most natural tool allowing one to do so is nonparametric hypothesis tests (NPHTs, hereafter denoted by ). Let be two datasets. allows to judge how compatible is a difference observed between and with the hypothesis that were drawn from the same distribution .
It is important to underline that is nonparametric, i.e. makes no assumptions on .
The above makes NPHTs natural candidates for detecting poison. However, whilst NPHTs are very good for natural hypothesis testing, they succumb spectacularly in adversarial scenarios where the attacker has full knowledge of the target’s specification . Indeed, section 3.1 illustrates such a collapse.
To regain a head-up over the attacker, our strategy will consist in mapping and into a secret space unpredictable by the adversary where can work confidentially. This mapping is defined by a key making it hard for the adversary to design and such that
2 A Brief Overview of Poisoning Attacks
Classical references introducing poisoning are [23, 22, 20, 24, 4, 30, 21, 31, 18, 7, 15]. At times (e.g. ) the opponent does not create or modify ’s but rather adds legitimate but carefully chosen ’s to to bias learning. Those inputs are usually computed using gradient descent. This was later generalized by .
During a poisoning attack, data modifications can either concern data or labels.  showed that a random flip of 40% of labels suffices to seriously affect SVMs.  showed that inserting malicious points into
could gradually shift the decision boundary of an anomaly detection classifier. Poisoning points were obtained by solving a linear programming problem maximizing the mean of the displacement of the mass center of. For a more complete overview we recommend .
2.0.1 Adversarial Goals.
Poisoning may seek to influence the classifier’s decision when presented with a later target query or to leak information about or .
The attacker’s goals always apply to the testing phase and may be:
Confidence Reduction: Have make more errors. In many cases, “less confidence” can clear suspicious instances at the benefit of doubt (in dubio pro reo).
Mis-classification attacks: are defined by replacing in the definition:
“Make conclude that a belongs to a wrong label.”
|Mis-classification444This is useful if any mistake may serve the opponent’s interests e.g. any navigation error would crash a drone with high probability.||random||random|
2.0.2 Adversarial capabilities
designate the degree of knowledge that the attacker has on the target system. Bibliography distinguishes between training phase capabilities and testing phase capabilities. Poisoning assumes training phase capabilities.
The attacker’s capabilities may be:
Data Injection: Add new data to .
Data Modification: Modify before training.
Logic Corruption: Modify the code (behavior) of 555This is the equivalent of fault attacks in cryptography..
3 Keyed Anti-Poisoning
To illustrate our strategy, we use Mann-Whitney’s -test and Stouffer’s method that we recall in the appendix.
We assume that when training starts, we are given a safe subset of denoted (where the subscript stands for “safe”). Our goal is to assess the safety of the upcoming subset of denoted (where the subscript stands for “unknown”).
We assume that and come from the same distribution . As mentioned before, the idea is to map and to a space hidden from the opponent. is keyed to prevent the attacker from predicting how to create adversarial input fooling .
can be Mann-Whitney’s test (illustrated in this paper) or any other NPHT e.g. the location test, the paired
test, Siegel-Turkey’s test, the variance test, or multidimensional tests such as deep gaussianization.
3.1 Trivial Mann-Whitney Poisoning
Let be Mann-Whitney’s -Test returning a -value:
is, between others, susceptible to poisoning as follows: assume that
is sampled from a Gaussian distributionand that is sampled from where (Figure 3). While and are totally different, will be misled.
For instance, after picking samples and samples (i.e. we took ), we get a -value of . From Mann-Whitney’s perspective, come from the same parent distribution with a very high degree of confidence while, in all evidence, they do not.
3.2 Keyed Mann-Whitney
We instantiate by secret random polynomials i.e. polynomials whose coefficients are randomly and secretly refreshed before each invocation of . Instead of returning , Det returns where:
The rationale is that will map the attacker’s input to an unpredictable location in which the Mann-Whitney is very likely to be safe.
random polynomials are selected as keys and Det calls for each polynomial. To aggregate all resulting -values, Det computes:
If , the sample is rejected as poisonous with very high probability.
Note that any smooth function can be used as , e.g. B-splines. The criterion on is that the random selection process must yield significantly different functions.
We illustrate the above by protecting . The good thing about is that random polynomials tend to diverge when but adapt well to the central interval in which the Gaussian is not negligible.
We attack by poisoning with , where is set to 3, 2, 1, and 0.5, respectively. For each value of , two sets of 50 samples are drawn from the two distributions. Those samples are then transformed into other sets by applying a random polynomial of degree 4 and then fed into to obtain a -value (using the two-sided mode). This -value predicts whether these two sets of transformed samples come from the same distribution: a
-value close to 0 is a strong evidence against the null hypothesis. In each of our experiments, we apply nine secret random polynomials of degree 4 and aggregate the resulting-values using Stouffer’s method. For each setting, we run 1000 simulations. Similarly, for the same polynomials and , we run a “honest” test, where both samples come from the same distribution.
We thus retrieve 1000 “attack” -values, which we sort by ascending order. Similarly, we sort the “honest” -values. It is a classic result that, under the null hypothesis, a
-value follows a random uniform distribution over, hence a plot of the sorted “honest” -values is a linear curve.
An attack is successful if, on average, the “attack” sample is accepted as least as often as the “honest” sample. This can be rewritten as , with the . Hence, a sufficient condition for the validity is that the curve of sorted attack -values (solid lines in our figures) is above the curve of sorted honest -values (dashed lines).
The first quadrant illustrates the polynomials used in the simulation and two bars for . The same random polynomials were used for each experiment. For simplicity, the coefficients of the polynomials were uniformly selected from , and (useless) polynomials of degree lower than 2 were excluded from the random selection. Then, we also added the identity polynomial (poly0), as a witness of what happens when there is no protection.
The following nine quadrants give the distribution of -values for each polynomial, over 1000 simulations, sorted in increasing order. The dotted distribution corresponds to what an honest user would obtain, whereas the plain line simulation is based on poisoned datasets.
The last quadrant contains the sorted distribution of the aggregated -values using Stouffer. Experimental results show that for poisoned datasets, the aggregated -values remain close to zero, while a honest dataset does not appear to be significantly affected. In other words, with very high probability, keyed testing detects poisoning.
We observe a saturation when is too far from , this is due to the fact that even after passing through the attack samples remain at the extremes. Hence if
is of odd degree, nothing changes. If the degree ofis even then the two extremes are projected to the same side and Mann-Whitney detects 100% of the poisonings. It follows that at saturation a keyed Mann-Whitney gives either very high or very low -value. This means that polynomials or B-splines must be carefully chosen to make keying effective.
The advantage of combining the -values with Stouffer’s method is that the weak -values are very penalizing (by opposition to Pearson’s method whose combined -value degrades much slower). A more conservative aggregation would be using Fisher’s method.
All in all, experimental results reveal that keying managed to endow the test with a significant level of immunity.
Interestingly, Det can be implemented independently of .
A cautionary note: Our scenario assumes that testing does not start before learning is complete. If the opponent can alternate learning and testing then he may infer that a poisonous sample was taken into account (if was updated and ’s behavior was modified). This may open the door to attacks on .
4 Notes and Further Research
This paper opens perspectives beyond the specific poisoning problem. e.g. cryptographers frequently use randomness tests to assess the quality of random number generators. In a strong attack model where the attacker knows and controls the random source it becomes possible to trick many s into declaring flagrantly non random data as random. Hence, the authors believe that developing keyed randomness tests is important and useful as such.
For instance, in the original minimum distance test 8000 points (a set ) sampled from the tested randomness source are placed in a square. Let be the minimum distance between the pairs. If is random then
is exponentially distributed with mean. To key the test a secret permutation of the plan can generated and the test can be applied to .
To the best of our knowledge such primitives were not proposed yet.
-  (2018) Prime and prejudice: primality testing under adversarial conditions. Note: Cryptology ePrint Archive, Report 2018/749https://eprint.iacr.org/2018/749 Cited by: §4.
-  (2010-11-01) The security of machine learning. Machine Learning 81 (2), pp. 121–148. External Links: Cited by: §2.
-  (2011-14–15 Nov) Support vector machines under adversarial label noise. In Proceedings of the Asian Conference on Machine Learning, C. Hsu and W. S. Lee (Eds.), Proceedings of Machine Learning Research, Vol. 20, South Garden Hotels and Resorts, Taoyuan, Taiwain, pp. 97–112. Cited by: §2.
-  (2012) Poisoning attacks against support vector machines. In Proceedings of the 29th International Coference on International Conference on Machine Learning, ICML’12, USA, pp. 1467–1474. External Links: Cited by: §2.
Wild patterns: ten years after the rise of adversarial machine learning. Pattern Recognition 84, pp. 317 – 331. External Links: Cited by: §2.
-  (1975) A method for combining non-independent, one-sided tests of significance. Biometrics 31 (4), pp. 987–992. Cited by: Appendix 0.B.
-  (2017) Analysis of Causative Attacks Against SVMs Learning from Data Streams. In Proceedings of the 3rd ACM on International Workshop on Security And Privacy Analytics, IWSPA ’17, New York, NY, USA, pp. 31–36. External Links: Cited by: §2.
-  (2008-05) Casting out demons: sanitizing training data for anomaly sensors. In 2008 IEEE Symposium on Security and Privacy (sp 2008), Vol. , pp. 81–95. Cited by: §1.0.2.
-  (2019) Quotient hash tables - efficiently detecting duplicates in streaming data. CoRR abs/1901.04358. External Links: Cited by: §4.
-  (2014) Generative adversarial nets. In Advances in neural information processing systems, pp. 2672–2680. Cited by: §1.0.1.
-  (2018) Choosing between methods of combining-values. Biometrika 105 (1), pp. 239–246. Cited by: Appendix 0.B.
Learning in the presence of malicious errors.
Proceedings of the Twentieth Annual ACM Symposium on Theory of Computing, STOC ’88, New York, NY, USA, pp. 267–280. External Links: Cited by: §2.
-  (1883-01) La cryptographie militaire. In Journal des sciences militaires, Vol. IX, pp. 5–38. Cited by: §1.0.2.
Online anomaly detection under adversarial impact.
Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, Y. W. Teh and M. Titterington (Eds.), Proceedings of Machine Learning Research, Vol. 9, Chia Laguna Resort, Sardinia, Italy, pp. 405–412. Cited by: §2.
-  (2017) Understanding black-box predictions via influence functions. In Proceedings of the 34th International Conference on Machine Learning - Volume 70, ICML’17, pp. 1885–1894. Cited by: §2.
-  (2002) Combining dependent -values. Statistics & Probability Letters 60 (2), pp. 183 – 190. External Links: Cited by: Appendix 0.B.
-  (2016) Curie: A method for protecting SVM Classifier from Poisoning Attack. ArXiv abs/1606.01584. Cited by: §1.0.2.
-  (2015) Using machine teaching to identify optimal training-set attacks on machine learners. In Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence, AAAI’15, pp. 2871–2877. External Links: Cited by: §2.
-  (2014) Bloom filters in adversarial environments. CoRR abs/1412.8356. External Links: Cited by: §4.
-  (2008) Exploiting machine learning to subvert your spam filter. In Proceedings of the 1st Usenix Workshop on Large-Scale Exploits and Emergent Threats, LEET’08, Berkeley, CA, USA, pp. 7:1–7:9. Cited by: §2.
On the practicality of integrity attacks on document-level sentiment analysis. In Proceedings of the 2014 Workshop on Artificial Intelligent and Security Workshop, AISec’14, New York, NY, USA, pp. 83–93. External Links: Cited by: §2.
-  (2006) Paragraph: thwarting signature learning by training maliciously. In Proceedings of the 9th International Conference on Recent Advances in Intrusion Detection, RAID’06, Berlin, Heidelberg, pp. 81–105. Cited by: §2.
-  (2006-05) Misleading worm signature generators using deliberate noise injection. In 2006 IEEE Symposium on Security and Privacy (S&P’06), pp. 15 pp.–31. Cited by: §2.
-  (2009) ANTIDOTE: Understanding and Defending Against Poisoning of Anomaly Detectors. In Proceedings of the 9th ACM SIGCOMM Conference on Internet Measurement, IMC ’09, New York, NY, USA, pp. 1–14. External Links: Cited by: §2.
-  (2018) Defense-GAN: Protecting classifiers against adversarial attacks using generative models. arXiv preprint arXiv:1805.06605. Cited by: §1.0.1, §2.
-  (2018) Defending against adversarial images using basis functions transformations. ArXiv abs/1803.10840. Cited by: §1.0.1.
-  (1949) The American Soldier, Vol.1: Adjustment during Army Life.. Princeton University Press, Princeton.. Cited by: Appendix 0.B.
-  (2019) Bridging machine learning and cryptography in defence against adversarial attacks. In Computer Vision – ECCV 2018 Workshops, L. Leal-Taixé and S. Roth (Eds.), Cham, pp. 267–279. Cited by: §4.
-  (2019) Population anomaly detection through deep gaussianization. In Proceedings of the 34th ACM/SIGAPP Symposium on Applied Computing, SAC ’19, New York, NY, USA, pp. 1330–1336. External Links: Cited by: §3.
-  (2012) Adversarial label flips attack on support vector machines. In Proceedings of the 20th European Conference on Artificial Intelligence, ECAI’12, Amsterdam, NL, pp. 870–875. External Links: Cited by: §2.
Is feature selection secure against training data poisoning?. In International Conference on Machine Learning, pp. 1689–1698. Cited by: §2.
Feature squeezing: detecting adversarial examples in deep neural networks. In Proceedings of the 25th Annual Network and Distributed System Security Symposium (NDSS), Cited by: §1.0.1.
Appendix 0.A Mann-Whitney’s -Test
Let be an arbitrary distribution.
Mann-Whitney’s -test is a non-parametric hypothesis test. The test assumes that the two compared sample sets are independent and that a total order exists on their elements (which is the case for real-valued data such as ML feature vectors).
Assuming that :
The null hypothesis is that .
The alternative hypothesis is that for .
The test is consistent666i.e., its power increases with and . when, under , .
The test computes a statistic called , which distribution under is known. When and are large enough, the distribution of under
is approximated by a normal distribution of known mean and variance.
is computed as follows:
Merge the elements of and . Sort the resulting list by ascending order.
Assign a rank to each element of the merged list. Equal elements get as rank the midpoint of their adjusted rankings777e.g., in the list , the fours all get the rank .
Sum the ranks for each set. Let be this sum for . Note that if then , with .
Let and .
When the are large enough ( elements) approximately follows a normal distribution.
Hence, one can check if the value
follows a standard normal distribution under , with being the mean of , and
its standard deviation under:
However, the previous formulae are only valid when there are no tied ranks. For tied ranks, the following formula is to be used:
Because under ,
follows a normal distribution, we can estimate the likelihood that the observed values comes from a standard normal distribution, hence getting a related-value from the standard normal table.
Appendix 0.B Stouffer’s -Value Aggregation Method
values can be aggregated in different ways . Stouffer  observes that the -value defined by is a standard normal variable under where is the standard normal CDF. Hence when are translated into , we get a collection of independent and identically distributed standard normal variables under . To combine the effect of all tests we sum all the which follows a normal distribution under with mean and variance
. The global test statistic
is hence standard normal under and can thus be reconverted into a -value in the standard normal table.
. However, these calculations imply that the underlying joint distribution is known, and the derivation of the combination statistics percentiles requires a numerical approximation.