Machine learning and pattern recognition techniques are increasingly being adopted in security applications like spam, intrusion and malware detection, despite their security to adversarial attacks has not yet been deeply understood. In adversarial settings, indeed, intelligent and adaptive attackers may carefully target the machine learning components of a system to compromise its security. Several distinct attack scenarios have been considered in a recent field of study, known asadversarial machine learning [1, 2, 3, 4]. For instance, it has been shown that it is possible to gradually poison a spam filter, an intrusion detection system, and even a biometric verification system (in general, a classification algorithm) by exploiting update mechanisms that enable the adversary to manipulate some of the training data [5, 6, 7, 8, 9, 10, 11, 12, 13]
; and that the detection of malicious samples by linear and even some classes of non-linear classifiers can beevaded with few targeted manipulations that reflect a proper change in their feature values [14, 13, 15, 16, 17]. Recently, poisoning and evasion attacks against clustering algorithms have also been formalized to show that malware clustering approaches can be significantly vulnerable to well-crafted attacks [18, 19].
Research in adversarial learning not only investigates the security properties of learning algorithms against well-crafted attacks, but it also focuses on the development of more secure learning algorithms. For evasion attacks, this has been mainly achieved by explicitly embedding knowledge into the learning algorithm of the possible data manipulation that can be performed by the attacker, e.g., using game-theoretical models for classification [15, 20, 21, 22], probabilistic models of the data distribution drift under attack [23, 24], and even multiple classifier systems [25, 26, 27]. Poisoning attacks and manipulation of the training data have been differently countered with data sanitization (i.e.
, a form of outlier detection)[28, 5, 6], multiple classifier systems , and robust statistics . Robust statistics have also been exploited to formally show that the influence function of SVM-like algorithms can be bounded under certain conditions ; e.g., if the kernel is bounded. This ensures some degree of robustness against small perturbations of training data, and it may be thus desirable also to improve the security of learning algorithms against poisoning.
In this work, we investigate the vulnerability of SVMs to a specific kind of training data manipulation, i.e., worst-case label noise. This can be regarded as a carefully-crafted attack in which the labels of a subset of the training data are flipped to maximize the SVM’s classification error. While stochastic label noise has been widely studied in the machine learning literature, to account for different kinds of potential labeling errors in the training data [31, 32], only a few works have considered adversarial, worst-case label noise, either from a more theoretical  or practical perspective [34, 35]. In [31, 33] the impact of stochastic and adversarial label noise on the classification error have been theoretically analyzed under the probably-approximately-correct learning model, deriving lower bounds on the classification error as a function of the fraction of flipped labels ; in particular, the test error can be shown to be lower bounded by and for stochastic and adversarial label noise, respectively. In recent work [34, 35], instead, we have focused on deriving more practical attack strategies to maximize the test error of an SVM given a maximum number of allowed label flips in the training data. Since finding the worst label flips is generally computationally demanding, we have devised suitable heuristics to find approximate solutions efficiently. To our knowledge, these are the only works devoted to understanding how SVMs can be affected by adversarial label noise.
From a more practical viewpoint, the problem is of interest as attackers may concretely have access and change some of the training labels in a number of cases. For instance, if feedback from end-users is exploited to label data and update the system, as in collaborative spam filtering, an attacker may have access to an authorized account (e.g., an email account protected by the same anti-spam filter), and manipulate the labels assigned to her samples. In other cases, a system may even ask directly to users to validate its decisions on some submitted samples, and use them to update the classifier (see, e.g.,
PDFRate,111Available at: http://pdfrate.com an online tool for detecting PDF malware ).
The practical relevance of poisoning attacks has also been recently discussed in the context of the detection of malicious crowdsourcing websites that connect paying users with workers willing to carry out malicious campaigns (e.g., spam campaigns in social networks) — a recent phenomenon referred to as crowdturfing. In fact, administrators of crowdturfing sites can intentionally pollute the training data used to learn classifiers, as it comes from
their websites, thus being able to launch poisoning attacks .
In this paper, we extend our work on adversarial label noise against SVMs [34, 35] by improving our previously-defined attacks (Sects. 3.1 and 3.3), and by proposing two novel heuristic approaches. One has been inspired from previous work on SVM poisoning  and incremental learning [38, 39], and makes use of a continuous relaxation of the label values to greedily maximize the SVM’s test error through gradient ascent (Sect. 3.2). The other exploits a breadth first search to greedily construct sets of candidate label flips that are correlated in their effect on the test error (Sect. 3.4). As in [34, 35], we aim at analyzing the maximum performance degradation incurred by an SVM under adversarial label noise, to assess whether these attacks can be considered a relevant threat. We thus assume that the attacker has perfect knowledge of the attacked system and of the training data, and left the investigation on how to develop such attacks having limited knowledge of the training data to future work. We further assume that the adversary incurs the same cost for flipping each label, independently from the corresponding data point. We demonstrate the effectiveness of the proposed approaches by reporting experiments on synthetic and real-world datasets (Sect. 4). We conclude in Sect. 5 with a discussion on the contributions of our work, its limitations, and future research, also related to the application of the proposed techniques to other fields, including semi-supervised and active learning.
2 Support Vector Machines and Notation
We revisit here structural risk minimization and SVM learning, and introduce the framework that will be used to motivate our attack strategies for adversarial label noise.
In risk minimization, the goal is to find a hypothesis that represents an unknown relationship between an input and an output space , captured by a probability measure . Given a non-negative loss function assessing the error between the prediction provided by and the true output , we can define the optimal hypothesis as the one that minimizes the expected risk over the hypothesis space , i.e., . Although is not usually known, and thus can not be computed directly, a set of i.i.d. samples drawn from are often available. In these cases a learning algorithm can be used to find a suitable hypothesis. According to structural risk minimization , the learner minimizes a sum of a regularizer and the empirical risk over the data:
where the regularizer is used to penalize excessive hypothesis complexity and avoid overfitting, the empirical risk is given by , and is a parameter that controls the trade-off between minimizing the empirical loss and the complexity of the hypothesis.
The SVM is an example of a binary linear classifier developed according to the aforementioned principle. It makes predictions in based on the sign of its real-valued discriminant function ; i.e., is classified as positive if , and negative otherwise. The SVM uses the hinge loss as a convex surrogate loss function, and a quadratic regularizer on , i.e., . Thus, SVM learning can be formulated according to Eq. (1) as the following convex quadratic programming problem:
An interesting property of SVMs arises from their dual formulation, which only requires computing inner products between samples during training and classification, thus avoiding the need of an explicit feature representation. Accordingly, non-linear decision functions in the input space can be learned using kernels, i.e., inner products in implicitly-mapped feature spaces. In this case, the SVM’s decision function is given as , where is the kernel function, and the implicit mapping. The SVM’s dual parameters are found by solving the dual problem:
where is the label-annotated version of the (training) kernel matrix . The bias is obtained from the corresponding Karush-Kuhn-Tucker (KKT) conditions, to satisfy the equality constraint (see, e.g., ).
In this paper, however, we are not only interested in how the hypothesis is chosen but also how it performs on a second validation or test dataset , which may be generally drawn from a different distribution . We thus define the error measure
which implicitly uses . This function evaluates the structural risk of a hypothesis that is trained on but evaluated on , and will form the foundation for our label flipping approaches to dataset poisoning. Moreover, since we are only concerned with label flips and their effect on the learner we use the notation to denote the above error measure when the datasets differ only in the labels used for training and used for evaluation; i.e., .
3 Adversarial Label Flips on SVMs
In this paper we aim at gaining some insights on whether and to what extent an SVM may be affected by the presence of well-crafted mislabeled instances in the training data. We assume the presence of an attacker whose goal is to cause a denial of service, i.e., to maximize the SVM’s classification error, by changing the labels of at most samples in the training set. Similarly to [35, 12], the problem can be formalized as follows.
We assume there is some learning algorithm known to the adversary that maps a training dataset into hypothesis space: . Although this could be any learning algorithm, we consider SVM here, as discussed above. The adversary wants to maximize the classification error (i.e., the risk), that the learner is trying to minimize, by contaminating the training data so that the hypothesis is selected based on tainted data drawn from an adversarially selected distribution over . However, the adversary’s capability of manipulating the training data is bounded by requiring to be within a neighborhood of (i.e., “close to”) the original distribution .
For a worst-case label flip attack, the attacker is restricted to only change the labels of training samples in and is allowed to change at most such labels in order to maximally increase the classification risk of — bounds the attacker’s capability, and it is fixed a priori. Thus, the problem can be formulated as
where ] is the indicator function, which returns one if the argument is true, and zero otherwise. However, as with the learner, the true risk cannot be assessed because is also unknown to the adversary. As with the learning paradigm described above, the risk used to select can be approximated using by the regularized empirical risk with a convex loss. Thus the objective in Eq. (5) becomes simply where, notably, the empirical risk is measured with respect to the true dataset with the original labels. For the SVM and hinge loss, this yields the following program:
where is the SVM’s dual discriminant function learned on the tainted data with labels .
The above optimization is a -hard subset selection problem, that includes SVM learning as a subproblem. In the next sections we present a set of heuristic methods to find approximate solutions to the posed problem efficiently; in particular, in Sect. 3.1 we revise the approach proposed in  according to the aforementioned framework, in Sect. 3.2 we present a novel approach for adversarial label flips based on a continuous relaxation of Problem (6), in Sect. 3.3 we present an improved, modified version of the approach we originally proposed in , and in Sect. 3.4 we finally present another, novel approach for adversarial label flips that aims to flips clusters of labels that are ‘correlated’ in their effect on the objective function.
3.1 Adversarial Label Flip Attack (alfa)
We revise here the near-optimal label flip attack proposed in , named Adversarial Label Flip Attack (alfa). It is formulated under the assumption that the attacker can maliciously manipulate the set of labels to maximize the empirical loss of the original classifier on the tainted dataset, while the classification algorithm preserves its generalization on the tainted dataset without noticing it. The consequence of this attack misleads the classifier to an erroneous shift of the decision boundary, which most deviates from the untainted original data distribution.
As discussed above, given the untainted dataset with labels , the adversary aims to flip at most labels to form the tainted labels that maximize . Alternatively, we can pose this problem as a search for labels that achieve the maximum difference between the empirical risk for classifiers trained on and , respectively. The attacker’s objective can thus be expressed as
To solve this problem, we note that the component of is a sum of losses over the data points, and the evaluation set only differs in its labels. Thus, for each data point, either we will have a component or a component contributing to the risk. By denoting with an indicator variable which component to use, the attacker’s objective can be rewritten as the problem of minimizing the following expression with respect to and :
In this expression, the dataset is effectively duplicated and either or are selected for the set . The variables are used to select an optimal subset of labels to be flipped for optimizing .
When alfa is applied to the SVM, we use the hinge loss and the primal SVM formulation from Eq. (2). We denote with and the loss of classifier when the label is respectively kept unchanged or flipped. Similarly, and are the corresponding slack variables for the new classifier . The above attack framework can then be expressed as:
To avoid integer programming which is generally -hard, the indicator variables, , are relaxed to be continuous on . The minimization problem in Eq. (8) is then decomposed into two iterative sub-problems. First, by fixing , the summands are constant, and thus the minimization reduces to the following QP problem:
Second, fixing and yields a set of fixed hinge losses, and . The minimization over (continuous)
is then a linear programming problem (LP):
After convergence of this iterative approach, the largest subset of corresponds to the near-optimal label flips within the budget . The complete alfa procedure is given as Algorithm 1.
3.2 ALFA with Continuous Label Relaxation (alfa-cr)
The underlying idea of the method presented in this section is to solve Problem (6) using a continuous relaxation of the problem. In particular, we relax the constraint that the tainted labels have to be discrete, and let them take on continuous real values on a bounded domain. We thus maximize the objective function in Problem (6) with respect to . Within this assumption, we optimize the objective through a simple gradient-ascent algorithm, and iteratively map the continuous labels to discrete values during the gradient ascent. The gradient derivation and the complete attack algorithm are respectively reported in Sects. 3.2.1 and 3.2.2.
3.2.1 Gradient Computation
Let us first compute the gradient of the objective in Eq. (6), starting from the loss-dependent term . Although this term is not differentiable when , it is possible to consider a subgradient that is equal to the gradient of , when , and to otherwise. The gradient of the loss-dependent term is thus given as:
where (0) if (otherwise), and
where we explicitly account for the dependency on . To compute the gradient of , we derive this expression with respect to each label in the training data using the product rule:
This can be compactly rewritten in matrix form as:
where, using the numerator layout convention,
The expressions for and required to compute the gradient in Eq. (14) can be obtained by assuming that the SVM solution remains in equilibrium while changes smoothly. This can be expressed as an adiabatic update condition using the technique introduced in [38, 39], and exploited in  for a similar gradient computation. Observe that for the training samples, the KKT conditions for the optimal solution of the SVM training problem can be expressed as:
where we remind the reader that, in this case, . The equality in condition (15)-(16) implies that an infinitesimal change in causes a smooth change in the optimal solution of the SVM, under the constraint that the sets , , and do not change. This allows us to predict the response of the SVM solution to the variation of as follows.
where is an -by- diagonal matrix, whose elements if , and elsewhere.
The assumption that the SVM solution does not change structure while updating implies that
where indexes the margin support vectors in (from the equality in condition 15). In the sequel, we will also use , , and , respectively to index the reserve vectors in , the error vectors in , and all the training samples. The above assumption leads to the following linear problem, which allows us to predict how the SVM solution changes while varies:
The first matrix can be inverted using matrix block inversion :
where and . Substituting this result to solve Problem (20), one obtains:
The assumption that the structure of the three sets is preserved also implies that and . Therefore, the first term in Eq. (14) can be simplified as:
As for the regularization term, the gradient can be simply computed as:
Thus, the complete gradient of the objective in Problem (6) is:
The structure of the SVM (i.e., the sets ) will clearly change while updating , hence after each gradient step we should re-compute the optimal SVM solution along with its corresponding structure. This can be done by re-training the SVM from scratch at each iteration. Alternatively, since our changes are smooth, the SVM solution can be more efficiently updated at each iteration using an active-set optimization algorithm initialized with the values obtained from the previous iteration as a warm start . Efficiency may be further improved by developing an ad hoc incremental SVM under label perturbations based on the above equations. This however includes the development of suitable bookkeeping conditions, similarly to [38, 39], and it is thus left to future investigation.
Our attack algorithm for
alfa-cr is given as Algorithm 2. It exploits the gradient derivation reported in the previous section to maximize the objective function with respect to continuous values of . The current best set of continuous labels is iteratively mapped to the discrete set , adding a label flip at a time, until flips are obtained.
3.3 ALFA based on Hyperplane Tilting (alfa-tilt)
We now propose a modified version of the adversarial label flip attack we presented in . The underlying idea of the original strategy is to generate different candidate sets of label flips according to a given heuristic method (explained below), and retain the one that maximizes the test error, similarly to the objective of Problem (6).
However, instead of maximizing the test error directly, here we consider a surrogate measure, inspired by our work in . In that work, we have shown that, under the agnostic assumption that the data is uniformly distributed in feature space, the SVM’s robustness against label flips can be related to the change in the angle between the hyperplane obtained in the absence of attack, and that learnt on the tainted data with label flips . Accordingly, the
alfa-tilt strategy considered here, aims to maximize the following quantity:
where , being and any two sets of training labels, and and are the SVM’s dual coefficients learnt from the untainted and the tainted data, respectively.
Candidate label flips are generated as explained in . Labels are flipped with non-uniform probabilities, depending on how well the corresponding training samples are classified by the SVM learned on the untainted training set. We thus increase the probability of flipping labels of reserve vectors (as they are reliably classified), and decrease the probability of label flips for margin and error vectors (inversely proportional to ). The former are indeed more likely to become margin or error vectors in the SVM learnt on the tainted training set, and, therefore, the resulting hyperplane will be closer to them. This will in turn induce a significant change in the SVM solution, and, potentially, in its test error. We further flip labels of samples in different classes in a correlated way to force the hyperplane to rotate as much as possible. To this aim, we draw a random hyperplane , in feature space, and further increase the probability of flipping the label of a positive sample (respectively, a negative one ), if ().
The full implementation of
alfa-tilt is given as Algorithm 3. It depends on the parameters and , which tune the probability of flipping a point’s label based on how well it is classified, and how well it is correlated with the other considered flips.
As suggested in , they can be set to , since this configuration has given reasonable results on several datasets.
3.4 Correlated Clusters
Here, we explore a different approach to heuristically optimizing that uses a breadth first search to greedily construct subsets (or clusters) of label flips that are ‘correlated’ in their effect on . Here, we use the term correlation loosely.
The algorithm starts by assessing how each singleton flip impacts and proceeds by randomly sampling a set of initial singleton flips to serve as initial clusters. For each of these clusters, , we select a random set of mutations to it (i.e., a mutation is a change to a single flip in the cluster), which we then evaluate (using the empirical 0-1 loss) to form a matrix . This matrix is then used to select the best mutation to make among the set of evaluated mutations. Clusters are thus grown to maximally increase the empirical risk.
To make the algorithm tractable, the population of candidate clusters is kept small. Periodically, the set of clusters are pruned to keep the population to size by discarding the worst evaluated clusters. Whenever a new cluster achieves the highest empirical error, that cluster is recorded as being the best candidate cluster. Further, if clusters grow beyond the limit of , the best deleterious mutation is applied until the cluster only has flips. This overall process of greedily creating clusters with respect to the best observed random mutations continues for a set number of iterations at which point the best flips until that point are returned. Pseudocode for the correlated clusters algorithm is given in Algorithm 4.
We evaluate the adversarial effects of various attack strategies against SVMs on both synthetic and real-world datasets. Experiments on synthetic datasets provide a conceptual representation of the rationale according to which the proposed attack strategies select the label flips. Their effectiveness, and the security of SVMs against adversarial label flips, is then more systematically assessed on different real-world datasets.
4.1 On Synthetic Datasets
To intuitively understand the fundamental strategies and differences of each of the proposed adversarial label flip attacks, we report here an experimental evaluation on two bi-dimensional datasets, where the positive and the negative samples can be perfectly separated by a linear and a parabolic decision boundary, respectively.222Data is available at http://home.comcast.net/~tom.fawcett/public_html/ML-gallery/pages/index.html. For these experiments, we learn SVMs with the linear and the RBF kernel on both datasets, using LibSVM . We set the regularization parameter , and the kernel parameter , based on some preliminary experiments. For each dataset, we randomly select training samples, and evaluate the test error on a disjoint set of samples. The proposed attacks are used to flip labels in the training data (i.e., a fraction of 10%), and the SVM model is subsequently learned on the tainted training set. Besides the four proposed attack strategies for adversarial label noise, further three attack strategies are evaluated for comparison, respectively referred to as farfirst, nearest, and random. As for farfirst and nearest, only the labels of the farthest and of the nearest samples to the decision boundary are respectively flipped. As for the random attack, training labels are randomly flipped. To mitigate the effect of randomization, each random attack selects the best label flips over repetitions.
Results are reported in Fig. 1. First, note how the proposed attack strategies alfa, alfa-cr, alfa-tilt, and correlated cluster generally exhibit clearer patterns of flipped labels than those shown by farfirst, nearest, and random, yielding indeed higher error rates. In particular, when the RBF kernel is used, the SVM’s performance is significantly affected by a careful selection of training label flips (cf. the error rates between the plots in the first and those in the second row of Fig. 1). This somehow contradicts the result in , where the use of bounded kernels has been advocated to improve robustness of SVMs against training data perturbations. The reason is that, in this case, the attacker does not have the ability to make unconstrained modifications to the feature values of some training samples, but can only flip a maximum of labels. As a result, bounding the feature space through the use of bounded kernels to counter label flip attacks is not helpful here. Furthermore, the security of SVMs may be even worsened by using a non-linear kernel, as it may be easier to significantly change (e.g., “bend”) a non-linear decision boundary using carefully-crafted label flips, thus leading to higher error rates. Amongst the attacks, correlated cluster shows the highest error rates when the linear (RBF) kernel is applied to the linearly-separable (parabolically-separable) data. In particular, when the RBF kernel is used on the parabolically-separable data, even only of label flips cause the test error to increase from to