## 1 Introduction

Machine learning and pattern recognition techniques are increasingly being adopted in security applications like spam, intrusion and malware detection, despite their security to adversarial attacks has not yet been deeply understood. In adversarial settings, indeed, intelligent and adaptive attackers may carefully target the machine learning components of a system to compromise its security. Several distinct attack scenarios have been considered in a recent field of study, known as

*adversarial machine learning*[1, 2, 3, 4]. For instance, it has been shown that it is possible to gradually

*poison*a spam filter, an intrusion detection system, and even a biometric verification system (in general, a classification algorithm) by exploiting update mechanisms that enable the adversary to manipulate some of the training data [5, 6, 7, 8, 9, 10, 11, 12, 13]

; and that the detection of malicious samples by linear and even some classes of non-linear classifiers can be

*evaded*with few targeted manipulations that reflect a proper change in their feature values [14, 13, 15, 16, 17]. Recently, poisoning and evasion attacks against clustering algorithms have also been formalized to show that malware clustering approaches can be significantly vulnerable to well-crafted attacks [18, 19].

Research in adversarial learning not only investigates the security
properties of learning algorithms against well-crafted
attacks, but it also focuses on the development of more secure learning
algorithms.
For evasion attacks, this has been mainly achieved by explicitly
embedding knowledge into the learning algorithm of the possible data
manipulation that can be performed by the attacker, *e.g.*, using
game-theoretical models for
classification [15, 20, 21, 22],
probabilistic models of the data distribution drift under
attack [23, 24],
and even multiple classifier systems [25, 26, 27].
Poisoning attacks and manipulation of the training data have been differently countered with data sanitization (*i.e.*

, a form of outlier detection)

[28, 5, 6], multiple classifier systems [29], and robust statistics [7]. Robust statistics have also been exploited to formally show that the*influence function*of SVM-like algorithms can be bounded under certain conditions [30];

*e.g.*, if the kernel is bounded. This ensures some degree of robustness against small perturbations of training data, and it may be thus desirable also to improve the security of learning algorithms against poisoning.

In this work, we investigate the vulnerability of SVMs to a specific
kind of training data manipulation, *i.e.*, worst-case label noise.
This can be regarded as a carefully-crafted attack in which the labels of a subset of the training data are flipped to maximize the SVM’s classification error. While stochastic label
noise has been widely studied in the machine learning literature, to
account for different kinds of potential labeling errors in the
training data [31, 32], only a few works have
considered adversarial, worst-case label noise, either from a more
theoretical [33] or practical
perspective [34, 35]. In [31, 33]
the impact of stochastic and adversarial label noise on the
classification error have been theoretically analyzed under the
*probably-approximately-correct* learning model, deriving lower
bounds on the classification error as a function of the fraction of
flipped labels ; in particular, the test error can be shown to
be lower bounded by and for stochastic and
adversarial label noise, respectively. In recent
work [34, 35], instead, we have focused on deriving
more practical attack strategies to maximize the test error of an SVM
given a maximum number of allowed label flips in the training data.
Since finding the worst label flips is generally computationally demanding,
we have devised suitable heuristics to find
approximate solutions efficiently. To our knowledge, these are
the only works devoted to understanding how SVMs can be affected by
adversarial label noise.

From a more practical viewpoint, the problem is of interest as attackers may concretely have access and change some of the training labels in a number of cases. For instance, if feedback from end-users is exploited to label data and update the system, as in collaborative spam filtering, an attacker may have access to an authorized account (*e.g.*, an email account protected by the same anti-spam filter), and manipulate the labels assigned to her samples. In other cases, a system may even ask directly to users to validate its decisions on some submitted samples, and use them to update the classifier (see, *e.g.*, `PDFRate`

,^{1}^{1}1Available at: http://pdfrate.com an online tool for detecting PDF malware [36]).
The practical relevance of poisoning attacks has also been recently discussed in the context of the detection of malicious crowdsourcing websites that connect paying users with workers willing to carry out malicious campaigns (*e.g.*, spam campaigns in social networks) — a recent phenomenon referred to as *crowdturfing*. In fact, administrators of crowdturfing sites can intentionally pollute the training data used to learn classifiers, as it comes from
their websites, thus being able to launch poisoning attacks [37].

In this paper, we extend our work on adversarial label noise against SVMs [34, 35] by improving our previously-defined attacks (Sects. 3.1 and 3.3), and by proposing two novel heuristic approaches. One has been inspired from previous work on SVM poisoning [12] and incremental learning [38, 39], and makes use of a continuous relaxation of the label values to greedily maximize the SVM’s test error through gradient ascent (Sect. 3.2).
The other exploits a breadth first search to greedily construct sets of candidate label flips that are *correlated* in their effect on the test error (Sect. 3.4).
As in [34, 35], we aim at analyzing the maximum performance degradation incurred by an SVM under adversarial label noise, to assess whether these attacks can be considered a relevant threat. We thus assume that the attacker has perfect knowledge of the attacked system and of the training data, and left the investigation on how to develop such attacks having limited knowledge of the training data to future work.
We further assume that the adversary incurs the same cost for flipping each label, independently from the corresponding data point.
We demonstrate the effectiveness of the proposed approaches by reporting experiments on synthetic and real-world datasets (Sect. 4). We conclude in Sect. 5 with a discussion on the contributions of our work, its limitations, and future research, also related to the application of the proposed techniques to other fields, including semi-supervised and active learning.

## 2 Support Vector Machines and Notation

We revisit here structural risk minimization and SVM learning, and introduce the framework that will be used to motivate our attack strategies for adversarial label noise.

In risk minimization, the goal is to find a hypothesis that represents an unknown relationship between an input and an output space , captured by a probability measure .
Given a non-negative *loss function* assessing the error between the prediction provided by and the true output , we can define the optimal hypothesis as the one that minimizes the
expected risk over the hypothesis space , *i.e.*, .
Although is not usually known, and thus can not be computed directly, a set of i.i.d. samples drawn from are often available. In these cases a learning algorithm can be used to find a suitable hypothesis.
According to structural risk minimization [40], the learner minimizes a sum of a
regularizer and the empirical risk over the data:

(1) |

where the regularizer is used to penalize excessive hypothesis complexity and avoid overfitting, the empirical risk is given by , and is a parameter that controls the trade-off between minimizing the empirical loss and the complexity of the hypothesis.

The SVM is an example of a binary linear classifier developed according to the aforementioned principle. It makes predictions in based on the sign of its real-valued discriminant function ; *i.e.*, is classified as positive if , and negative otherwise.
The SVM uses the hinge loss as a convex surrogate loss function, and
a quadratic regularizer on , *i.e.*,
.
Thus, SVM learning can be formulated according to
Eq. (1) as the following convex quadratic
programming problem:

(2) |

An interesting property of SVMs arises from their *dual*
formulation, which only requires computing inner products between samples during training and classification, thus avoiding the need of an *explicit* feature representation.
Accordingly, non-linear decision functions in the input space can be learned using *kernels*, *i.e.*, inner products in implicitly-mapped feature spaces.
In this case, the SVM’s decision
function is given as , where is the kernel function, and the implicit mapping. The SVM’s dual parameters are found by solving the dual problem:

(3) |

where is the
label-annotated version of the (training) kernel matrix . The bias is obtained from the corresponding Karush-Kuhn-Tucker (KKT) conditions,
to satisfy the equality constraint (see, *e.g.*, [41]).

In this paper, however, we are not only interested in how the hypothesis is chosen but also how it performs on a second validation or test dataset , which may be generally drawn from a different distribution . We thus define the error measure

(4) |

which implicitly uses . This
function evaluates the structural risk of a
hypothesis that is *trained* on but
*evaluated* on , and will form the foundation for
our label flipping approaches to dataset poisoning. Moreover, since
we are only concerned with label flips and their effect on the learner
we use the notation to denote the above
error measure when the datasets differ only in the labels
used for training and used for evaluation; *i.e.*, .

## 3 Adversarial Label Flips on SVMs

In this paper we aim at gaining some insights on whether and to what
extent an SVM may be affected by the presence of
well-crafted mislabeled instances in the training data. We assume the
presence of an attacker whose goal is to cause a denial of service, *i.e.*, to maximize the SVM’s
classification error, by changing the labels of at most samples
in the training set. Similarly to [35, 12], the
problem can be formalized as follows.

We assume there is some learning algorithm
known to the adversary that maps a training dataset into
hypothesis space: . Although this could be any learning algorithm, we
consider SVM here, as discussed above.
The adversary wants to maximize the classification error (*i.e.*, the risk), that the learner is trying to minimize, by contaminating the training
data so that the hypothesis is selected based on tainted data drawn
from an adversarially selected distribution over . However, the adversary’s capability of manipulating the training data is bounded by requiring to be within a
neighborhood of (*i.e.*, “close
to”) the original distribution .

For a worst-case label flip attack, the attacker is restricted to only
change the labels of training samples in and is allowed to
change at most such labels in order to maximally increase the
classification risk of — bounds the attacker’s
capability, and it is fixed *a priori*. Thus, the problem can be
formulated as

(5) | ||||

s.t. | ||||

where ] is the indicator function, which returns one if the argument is true, and zero otherwise. However, as with the learner, the true risk cannot be assessed because is also unknown to the adversary. As with the learning paradigm described above, the risk used to select can be approximated using by the regularized empirical risk with a convex loss. Thus the objective in Eq. (5) becomes simply where, notably, the empirical risk is measured with respect to the true dataset with the original labels. For the SVM and hinge loss, this yields the following program:

(6) | ||||

s.t. | ||||

where is the SVM’s dual discriminant function learned on the tainted data with labels .

The above optimization is a -hard subset selection problem, that includes SVM learning as a subproblem. In the next sections we present a set of heuristic methods to find approximate solutions to the posed problem efficiently; in particular, in Sect. 3.1 we revise the approach proposed in [35] according to the aforementioned framework, in Sect. 3.2 we present a novel approach for adversarial label flips based on a continuous relaxation of Problem (6), in Sect. 3.3 we present an improved, modified version of the approach we originally proposed in [34], and in Sect. 3.4 we finally present another, novel approach for adversarial label flips that aims to flips clusters of labels that are ‘correlated’ in their effect on the objective function.

### 3.1 Adversarial Label Flip Attack (alfa)

We revise here the near-optimal label flip attack proposed in [35], named Adversarial Label Flip Attack (alfa). It is formulated under the assumption that the attacker can maliciously manipulate the set of labels to maximize the empirical loss of the original classifier on the tainted dataset, while the classification algorithm preserves its generalization on the tainted dataset without noticing it. The consequence of this attack misleads the classifier to an erroneous shift of the decision boundary, which most deviates from the untainted original data distribution.

As discussed above, given the untainted dataset with labels , the adversary aims to flip at most labels to form the tainted labels that maximize . Alternatively, we can pose this problem as a search for labels that achieve the maximum difference between the empirical risk for classifiers trained on and , respectively. The attacker’s objective can thus be expressed as

(7) | ||||

To solve this problem, we note that the component of is a sum of losses over the data points, and the evaluation set only differs in its labels. Thus, for each data point, either we will have a component or a component contributing to the risk. By denoting with an indicator variable which component to use, the attacker’s objective can be rewritten as the problem of minimizing the following expression with respect to and :

In this expression, the dataset is effectively duplicated and either or are selected for the set . The variables are used to select an optimal subset of labels to be flipped for optimizing .

When alfa is applied to the SVM, we use the hinge loss and the primal SVM formulation from Eq. (2). We denote with and the loss of classifier when the label is respectively kept unchanged or flipped. Similarly, and are the corresponding slack variables for the new classifier . The above attack framework can then be expressed as:

(8) | ||||

To avoid integer programming which is generally -hard, the indicator variables, , are relaxed to be continuous on . The minimization problem in Eq. (8) is then decomposed into two iterative sub-problems. First, by fixing , the summands are constant, and thus the minimization reduces to the following QP problem:

(9) | ||||

Second, fixing and yields a set of fixed hinge losses, and . The minimization over (continuous)

is then a linear programming problem (LP):

(10) | ||||

s.t. |

After convergence of this iterative approach, the largest subset of corresponds to the near-optimal label flips within the budget . The complete alfa procedure is given as Algorithm 1.

### 3.2 ALFA with Continuous Label Relaxation (alfa-cr)

The underlying idea of the method presented in this section is to solve Problem (6) using a continuous relaxation of the problem. In particular, we relax the constraint that the tainted labels have to be discrete, and let them take on continuous real values on a bounded domain. We thus maximize the objective function in Problem (6) with respect to . Within this assumption, we optimize the objective through a simple gradient-ascent algorithm, and iteratively map the continuous labels to discrete values during the gradient ascent. The gradient derivation and the complete attack algorithm are respectively reported in Sects. 3.2.1 and 3.2.2.

#### 3.2.1 Gradient Computation

Let us first compute the gradient of the objective in Eq. (6), starting from the loss-dependent term . Although this term is not differentiable when , it is possible to consider a subgradient that is equal to the gradient of , when , and to otherwise. The gradient of the loss-dependent term is thus given as:

(11) |

where (0) if (otherwise), and

(12) |

where we explicitly account for the dependency on . To compute the gradient of , we derive this expression with respect to each label in the training data using the product rule:

(13) |

This can be compactly rewritten in matrix form as:

(14) |

where, using the numerator layout convention,

The expressions for and required to compute the gradient in Eq. (14) can be obtained by assuming that the SVM solution remains in equilibrium while changes smoothly. This can be expressed as an adiabatic update condition using the technique introduced in [38, 39], and exploited in [12] for a similar gradient computation. Observe that for the *training* samples, the KKT conditions for the optimal solution of the SVM training problem can be expressed as:

(15) | ||||

(16) |

where we remind the reader that, in this case, .
The equality in condition (15)-(16) implies that an infinitesimal change in causes a smooth change in the optimal solution of the SVM, under the constraint that the sets , , and do not change. This allows us to predict the *response* of the SVM solution
to the variation of as follows.

By differentiation of Eqs. (15)-(16), we obtain:

(17) | |||||

(18) |

where is an -by- diagonal matrix, whose elements if , and elsewhere.

The assumption that the SVM solution does not change structure while updating implies that

(19) |

where indexes the *margin* support vectors in (from the equality in condition 15).
In the sequel, we will also use , , and , respectively to index the *reserve* vectors in , the *error* vectors in , and all the training samples.
The above assumption leads to the following linear problem, which allows us to predict how the SVM solution changes while varies:

(20) |

The first matrix can be inverted using matrix block inversion [42]:

(21) |

where and . Substituting this result to solve Problem (20), one obtains:

(22) | ||||

(23) |

The assumption that the structure of the three sets is preserved also implies that and . Therefore, the first term in Eq. (14) can be simplified as:

(24) |

Eqs. (22) and (23) can be now substituted into Eq. (24), and further into Eq. (11) to compute the gradient of the loss-dependent term of our objective function.

As for the regularization term, the gradient can be simply computed as:

(25) |

Thus, the complete gradient of the objective in Problem (6) is:

(26) |

The structure of the SVM (*i.e.*, the sets ) will clearly change while updating , hence after each gradient step we should re-compute the optimal SVM solution along with its corresponding structure.
This can be done by re-training the SVM from scratch at each iteration.
Alternatively, since our changes are *smooth*, the SVM solution can be more efficiently updated at each iteration using an active-set optimization algorithm initialized with the values obtained from the previous iteration as a warm start [43]. Efficiency may be further improved by developing an ad hoc incremental SVM under label perturbations based on the above equations. This however includes the development of suitable bookkeeping conditions, similarly to [38, 39], and it is thus left to future investigation.

#### 3.2.2 Algorithm

Our attack algorithm for `alfa-cr`

is given as Algorithm 2. It exploits the gradient derivation reported in the previous section to maximize the objective function with respect to continuous values of . The current best set of continuous labels is iteratively mapped to the discrete set , adding a label flip at a time, until flips are obtained.

### 3.3 ALFA based on Hyperplane Tilting (alfa-tilt)

We now propose a modified version of the adversarial label flip attack we presented in [34]. The underlying idea of the original strategy is to generate different candidate sets of label flips according to a given heuristic method (explained below), and retain the one that maximizes the test error, similarly to the objective of Problem (6).
However, instead of maximizing the test error directly, here we consider a surrogate measure, inspired by our work in [44]. In that work, we have shown that, under the *agnostic* assumption that the data is uniformly distributed in feature space, the SVM’s robustness against label flips can be related to the change in the angle between the hyperplane obtained in the absence of attack, and that learnt on the tainted data with label flips . Accordingly, the `alfa-tilt`

strategy considered here, aims to maximize the following quantity:

(27) |

where , being and any two sets of training labels, and and are the SVM’s dual coefficients learnt from the untainted and the tainted data, respectively.

Candidate label flips are generated as explained in [34]. Labels are flipped with non-uniform probabilities, depending on how well the corresponding training samples are classified by the SVM learned on the untainted training set. We thus increase the probability of flipping labels of reserve vectors (as they are reliably classified), and decrease the probability of label flips for margin and error vectors (inversely proportional to ). The former are indeed more likely to become margin or error vectors in the SVM learnt on the tainted training set, and, therefore, the resulting hyperplane will be closer to them. This will in turn induce a significant change in the SVM solution, and, potentially, in its test error. We further flip labels of samples in different classes in a correlated way to force the hyperplane to rotate as much as possible. To this aim, we draw a random hyperplane , in feature space, and further increase the probability of flipping the label of a positive sample (respectively, a negative one ), if ().

The full implementation of `alfa-tilt`

is given as Algorithm 3. It depends on the parameters and , which tune the probability of flipping a point’s label based on how well it is classified, and how well it is correlated with the other considered flips.
As suggested in [34], they can be set to , since this configuration has given reasonable results on several datasets.

### 3.4 Correlated Clusters

Here, we explore a different approach to heuristically optimizing
that uses a breadth first search to greedily
construct subsets (or *clusters*) of label flips that are ‘correlated’
in their effect on . Here, we use the term *correlation*
loosely.

The algorithm starts by assessing how each singleton flip impacts
and proceeds by randomly sampling a set of initial
singleton flips to serve as initial clusters. For each of these
clusters, , we select a random set of mutations to it (*i.e.*, a
mutation is a change to a single flip in the cluster), which we then
evaluate (using the empirical 0-1 loss) to form a matrix
. This matrix is then used to select the best mutation to make
among the set of evaluated mutations. Clusters are thus grown to
maximally increase the empirical risk.

To make the algorithm tractable, the population of candidate clusters
is kept small. Periodically, the set of clusters are pruned to keep
the population to size by discarding the worst evaluated clusters.
Whenever a new cluster achieves the highest empirical error, that
cluster is recorded as being the best candidate cluster. Further, if
clusters grow beyond the limit of , the best *deleterious*
mutation is applied until the cluster only has flips. This overall
process of greedily creating clusters with respect to the best
observed random mutations continues for a set number of iterations
at which point the best flips until that point are returned.
Pseudocode for the correlated clusters algorithm is given in
Algorithm 4.

## 4 Experiments

We evaluate the adversarial effects of various attack strategies against SVMs on both synthetic and real-world datasets. Experiments on synthetic datasets provide a conceptual representation of the rationale according to which the proposed attack strategies select the label flips. Their effectiveness, and the security of SVMs against adversarial label flips, is then more systematically assessed on different real-world datasets.

### 4.1 On Synthetic Datasets

To intuitively understand the fundamental strategies and differences of each of the proposed adversarial label flip attacks, we report here an experimental evaluation on two bi-dimensional datasets, where the positive and the negative samples can be perfectly separated by a linear and a parabolic decision boundary, respectively.^{2}^{2}2Data is available at http://home.comcast.net/~tom.fawcett/public_html/ML-gallery/pages/index.html.
For these experiments, we learn SVMs with the linear and the RBF kernel on both datasets, using LibSVM [41]. We set the regularization parameter , and the kernel parameter , based on some preliminary experiments. For each dataset, we randomly select training samples, and evaluate the test error on a disjoint set of samples. The proposed attacks are used to flip labels in the training data (*i.e.*, a fraction of 10%), and the SVM model is subsequently learned on the tainted training set.
Besides the four proposed attack strategies for adversarial label noise, further three attack strategies are evaluated for comparison, respectively referred to as farfirst, nearest, and random. As for farfirst and nearest, only the labels of the farthest and of the nearest samples to the decision boundary are respectively flipped.
As for the random attack, training labels are randomly flipped. To mitigate the effect of randomization, each random attack selects the best label flips over repetitions.

Results are reported in Fig. 1.
First, note how the proposed attack strategies alfa, alfa-cr, alfa-tilt, and correlated cluster generally exhibit clearer patterns of flipped labels than those shown by farfirst, nearest, and random, yielding indeed higher error rates.
In particular, when the RBF kernel is used, the SVM’s performance is significantly affected by a careful selection of training label flips (*cf.* the error rates between the plots in the first and those in the second row of Fig. 1). This somehow contradicts the result in [30], where the use of bounded kernels has been advocated to improve robustness of SVMs against training data perturbations. The reason is that, in this case, the attacker does not have the ability to make unconstrained modifications to the feature values of some training samples, but can only flip a maximum of labels. As a result, bounding the feature space through the use of bounded kernels to counter label flip attacks is not helpful here.
Furthermore, the security of SVMs may be even worsened by using a non-linear kernel, as it may be easier to significantly change (*e.g.*, “bend”) a non-linear decision boundary using carefully-crafted label flips, thus leading to higher error rates.
Amongst the attacks, correlated cluster shows the highest error rates when the linear (RBF) kernel is applied to the linearly-separable (parabolically-separable) data. In particular, when the RBF kernel is used on the parabolically-separable data, even only of label flips cause the test error to increase from to