In recent years, multiple papers have demonstrated that machine learning classifiers can be fooled by adversarial examples, i.e. an example that is close to a correctly classified data point , but is classified incorrectly (Akhtar2018; Madry2018). The threat of such attacks is not to be underestimated, especially in security-critical applications such as medicine or autonomous driving, where adversarial examples could lead to misdiagnoses or crashes (Eykholt2018).
Despite this serious threat to all classification models, existing research has almost exclusively focused on image data (Akhtar2018; Madry2018), with the notable exceptions of a few contributions on audio data (Carlini2018), text data (Ebrahimi2018), and graph data (Dai2018; Zuegner2018). In particular, no adversarial attack approach has yet been developed for tree data, such as syntax trees of computer programs or biomedical molecules. Furthermore, all attack approaches for non-image data to date rely on knowledge about the classifier architecture and/or gradient, which may not always be available (Madry2018).
In this paper, we address both issues by introducing adversarial edit attacks, a novel black-box attack scheme for tree data. In particular, we propose to select for a point a neighboring point with a different label , compute the tree edits necessary to change into , and applying the minimum number of edits which still change the classifier output.
Our paper is structured as follows. We first introduce background and related work on adversarial examples, then introduce our adversarial attack method, and finally evaluate our method by attacking seven different tree classifiers on four tree data sets, two from the programming domain and two from the biomedical domain.
2 Related Work
Following Szegedy et al. (Szegedy2014), we define an adversarial example for some data point and a classifier and a target label as the solution to the following optimization problem
where is a distance on the data space . In other words, is the closest data point to which is still classified as . For image data, the distance is often so small that and look exactly the same to human observers (Szegedy2014).
Note that Problem 1 is hard to solve because is typically high dimensional and the constraint
is discrete. Accordingly, the problem has been addressed with heuristic approaches, such as the fast gradient sign method(Goodfellow2014), which changes along the sign of the gradient of the classifier loss; or Carlini-Wagner attacks, which incorporate the discrete label constraint as a differentiable term in the objective function (Carlini2017). We call these methods white-box because they all rely on knowledge of the architecture and/or gradient of the classifier. In contrast, there also exist black-box attack methods, which only need to query itself, such as one-pixel attacks, which are based on evolutionary optimization instead of gradient-based optimization (Akhtar2018; Su2017).
In the realm of non-image data, prior research has exclusively focused on white-box attacks for specific data types and/or models. In particular, (Carlini2018) consider audio files, relying on decibels and the CTC loss as measure of distance; (Ebrahimi2018) attack text data by inferring single character replacements that increase the classification loss; and (Dai2018; Zuegner2018)
attack graph data by inferring edge deletions or insertions which fool a graph convolutional neural network model.
Our own approach is related to (Carlini2018), in that we rely on an alignment between two inputs to construct adversarial examples, and to (Ebrahimi2018), in that we consider discrete node-level changes, i.e. node deletions, replacements, or insertions. However, in contrast to these prior works, our approach is black-box instead of white-box and works in tree data as well as sequence data.
To develop an adversarial attack scheme for tree data, we face two challenges. First, Problem 1 requires a distance function for trees. Second, we need a method to apply small changes to a tree in order to construct an adversarial tree . We can address both challenges with the tree edit distance, which is defined as the minimum number of node deletions, replacements, or insertions needed to change a tree into another (Zhang1989) and thus provides both a distance and a change model.
Formally, we define a tree over some finite alphabet recursively as an expression , where and where is a (possibly empty) list of trees over . We denote the set of all trees over as . As an example, , a(b), and a(b(a, a), a) are both trees over the alphabet . We define the size of a tree recursively as .
Next, we define a tree edit over alphabet as a function . In more detail, we consider node deletions , replacements , and insertions , which respectively delete the th node in the input tree and move its children up to the parent, relabel the th node in the input tree with symbol , and insert a new node with label as th child of node , moving former children down. Figure 1 displays the effects of each edit type.
We define an edit script as a sequence of tree edits and we define the application of to a tree recursively as . Figure 1 displays an example edit script.
Finally, we define the tree edit distance as the length of the shortest script which transforms into , i.e. . This tree edit distance can be computed efficiently via dynamic programming in (Zhang1989). We note that several variations of the tree edit distance with other edit models exist, which are readily compatible with our approach (Bille2005; Paassen2018ICML). For brevity, we focus on the classic tree edit distance in this paper.
Random baseline attack:
The concept of tree edits yields a baseline attack approach for trees. Starting from a tree with label , we apply random tree edits, yielding another tree , until . To make this more efficient, we double the number of edits in each iteration until , yielding an edit script , and then use binary search to identify the shortest prefix such that . This reduced the number of queries to .
Note that this random attack scheme may find solutions which are far away from , thus limiting the plausibility as adversarial examples. To account for such cases, we restrict Problem 1 further and impose that only counts as a solution if is still closer to than to any point which is correctly classified and has a different label than (refer to Figure 2).
Another drawback of our random baseline is that it can not guarantee results after a fixed amount of edits because we may not yet have explored enough trees to have crossed the classification boundary. We address this limitation with our proposed attack method, backtracing attacks.
For any two trees and , we can compute a co-optimal edit script with and in via a technique called backtracing (Paassen2018arxiv, refer to Algorithm 6 and Theorem 16). This forms the basis for our proposed attack. In particular, we select for a starting tree the closest neighbor with the target label , i.e. . Then, we use backtracing to compute the shortest script from to . This script is guaranteed to change the label at some point. We then apply binary search to identify the shortest prefix of which still changes the label (refer to Figure 2). Refer to Algorithm 1 for the details of the algorithm.
Note that we can upper-bound the length of by , because at worst we delete entirely and then insert entirely. Accordingly, our attack finishes after at most steps/queries to . Finally, because is the closest tree with label to , our attack is guaranteed to yield a successful adversarial example if our prefix is shorter than half of , because then , which implies that . In other words, we are guaranteed to find a solution to problem 1, in the sense that our our label is guaranteed to change to , and that our solution is closest to along the shortest script towards .
In our evaluation, we attack seven different tree classifiers on four data sets. As outcome measures, we consider the success rate, i.e. the fraction of test data points for which the attack could generate a successful adversarial example according to the definition in Figure 2; and the distance ratio , i.e. how much closer is to compared to other points with the same label as . To avoid excessive computation times, we abort random adversarial attacks that have not succeeded after tree edits. Accordingly, the distance ratio is not available for random attacks that have been aborted, yielding some n.a. entries in our results (Table 1).
Our experimental hypotheses are that backtracing attacks succeed more often than random attacks due to their targeted nature (H1), but that random attacks have lower distance ratios (H2), because they have a larger search space from which to select close adversarials.
We perform our evaluation on four tree classification data sets from (Paassen2018ICML), in particular MiniPalindrome and Sorting as data sets of Java programs, as well as Cystic and Leukemia from the biomedical domain. The number of trees in each data set are , , , and respectively. The latter three data sets are (imbalanced) binary classification problems, the first is a six-class problem. We perform all experiments in a crossvalidation with , , , and folds for the respective data sets, following the protocol of (Paassen2018ICML).
On each data set, we train seven different classifiers, namely five support vector machines (SVM) with different kernels and two recursive neural network types. As the first two kernels, we consider the double centering kernel (linear; (Gisbrecht2015)
) based on the tree edit distance, and the radial basis function kernel (RBF) , for which we optimize the bandwidth parameter in a nested crossvalidation in the range , where
is the average tree edit distance in the data set. We ensure positive semi-definiteness for these kernels via the clip eigenvalue correction(Gisbrecht2015). Further, we consider three tree kernels, namely the subtree kernel, which counts the number of shared proper subtrees, the subset tree kernel, which counts the number of shared subset trees, and the partial tree kernel, which counts the number of shared partial trees (Aiolli2011). All three kernels have a decay hyper-parameter , which regulates the influence of larger subtrees. We optimize this hyper-parameter in a nested crossvalidation for each kernel in the range . For all SVM instances, we also optimized the regularization hyper-parameter in the range .
As neural network variations, we first consider recursive neural networks (Sperduti1997), which map a tree to a vector by means of the recursive function , where is the logistic function and as well as for all are the parameters of the model. We classify a tree by means of another linear layer with one output for each of the classes, i.e. , where and are parameters of the model and denotes the th entry of vector . We trained the network using the crossentropy loss and Adam(Kingma2015) as optimizer until the training loss dropped below . Note that the number of embedding dimensions is a hyper-parameter of the model, which we fixed here to as this was sufficient to achieve the desired training loss. Finally, we consider tree echo state networks (TES), which have the same architecture as recursive neural networks, but where the recursive weight matrices
and the bias vectorsremain untrained after random initialization. Only the output parameters and
are trained via simple linear regression. The scaling of the recursive weight matrices andare hyper-parameters of the model, which we optimized in a nested crossvalidation via grid search in the ranges and respectively.
As implementations, we use the scikit-learn version of SVM, the edist package for the tree edit distance and its backtracing111https://gitlab.ub.uni-bielefeld.de/bpaassen/python-edit-distances, the ptk toolbox222http://joedsm.altervista.org/pythontreekernels.htm for the ST, SST, and PT kernels (Aiolli2011)
, a custom implementation of recursive neural networks using pytorch(pytorch2017), and a custom implementation of tree echo state networks333All implementations and experiments are available at https://gitlab.ub.uni-bielefeld.de/bpaassen/adversarial-edit-attacks. We perform all experiments on a consumer grade laptop with an Intel i7 CPU.
Results and Discussion:
Table 1 displays the mean classification error standard deviation in crossvalidation, as well as the success rates and the distance ratios for random attacks and backtracing attacks for all data sets and all classifiers.
|classifier||accuracy||success rate||dist. ratio||success rate||dist. ratio|
We evaluate our results statistically by aggregating all crossvalidation folds across data sets and comparing success rates and distance rations between in a a one-sided Wilcoxon sign-rank test with Bonferroni correction. We observe that backtracing attacks have higher success rates for the linear and RBF kernel SVM (), slightly higher rates for the ST and SST kernels (), indistinguishable success for the PT kernel, and lower success rates for the recursive and tree echo state networks (). This generally supports our hypothesis that backtracing attacks have higher success rates (H1), except for both neural network models. This is especially pronounced for Cystic and Leukemia data sets, where random attacks against SVM models always failed.
Regarding H2, we observe that random attacks achieve lower distance ratios for the ST, SST, and PT kernels (), and much lower ratios for recursive neural nets and tree echo state nets (). For the linear and RBF kernel, the distance ratios are statistically indistinguishable. This supports H2.
In this contribution, we have introduced a novel adversarial attack strategy for tree data based on tree edits in one random and one backtracing variation. We observe that backtracing attacks achieve more consistent and reliable success across data sets and classifiers compared to the random baseline. Only for recursive neural networks are random attacks more successful. We also observe that the search space for backtracing attacks may be too constrained because random attacks generally find adversarials that are closer to the original sample. Future research could therefore consider alternative search spaces, e.g. based on semantic considerations. Most importantly, our research highlights the need for defense mechanisms against adversarial attacks for tree classifiers, especially neural network models.