Log In Sign Up

On the transferability of adversarial examples between convex and 01 loss models

by   Yunzhe Xue, et al.

We show that white box adversarial examples do not transfer effectively between convex and 01 loss and between 01 loss models compared to between convex models. We also show that convex substitute model black box attacks are less effective on 01 loss than convex models, and that 01 loss substitute model attacks are ineffective on both convex and 01 loss models. We show intuitively by example how the presence of outliers can cause different decision boundaries between 01 and convex loss models which in turn produces adversaries that are non-transferable. Indeed we see on MNIST that adversaries transfer between 01 loss and convex models more easily than on CIFAR10 and ImageNet which are likely to contain outliers. We also show intuitively by example how the non-continuity of 01 loss makes adversaries non-transferable in a two layer neural network.


page 1

page 2

page 3

page 4


Defending against substitute model black box adversarial attacks with the 01 loss

Substitute model black box attacks can create adversarial examples for a...

Evaluating Ensemble Robustness Against Adversarial Attacks

Adversarial examples, which are slightly perturbed inputs generated with...

Intermediate Level Adversarial Attack for Enhanced Transferability

Neural networks are vulnerable to adversarial examples, malicious inputs...

Toward Adversarial Robustness by Diversity in an Ensemble of Specialized Deep Neural Networks

We aim at demonstrating the influence of diversity in the ensemble of CN...

Yet Another Intermediate-Level Attack

The transferability of adversarial examples across deep neural network (...

Out-distribution training confers robustness to deep neural networks

The easiness at which adversarial instances can be generated in deep neu...

Controlling Over-generalization and its Effect on Adversarial Examples Generation and Detection

Convolutional Neural Networks (CNNs) allowed improving the state-of-the-...

1 Introduction

State of the art machine learning algorithms can achieve high accuracies in classification tasks but misclassify minor perturbations in the data known as as adversarial attacks

(Goodfellow et al., 2014; Papernot et al., 2016b; Kurakin et al., 2016; Carlini & Wagner, 2017; Brendel et al., 2017). Data corruptions that go beyond adversarial perturbations such as image brightness, contrast, fog, and snow for example also pose a challenge to machine learning methods (Hendrycks & Dietterich, 2019).

The 01 loss is known to be robust to outliers (Bartlett et al., 2004) and to label noise in the training data (Manwani & Sastry, 2013; Ghosh et al., 2015). Does robustness of 01 loss also extend to adversarial data? We study this in the setting of white box and substitute model black box attacks. Computationally 01 loss presents a considerable challenge because it is NP-hard to solve (Ben-David et al., 2003). Previous attempts (Zhai et al., 2013; Tang et al., 2014; Shalev-Shwartz et al., 2011; Li & Lin, 2007; Nguyen & Sanner, 2013) lack on-par test accuracy with convex solvers and are slow and impractical for large multiclass image benchmarks, except for the recent stochastic coordinate descent (Xie et al., 2019)

We propose a two layer neural network with sign activation (01) loss, which to the best of our knowledge is the first such network to be proposed. We train it with stochastic coordinate descent and show that it achieves on-par test accuracy to equivalent convex models. We then proceed with white box and substitute model black box attacks on image benchmarks MNIST (LeCun et al., 1998), CIFAR10 (Krizhevsky, 2009), and Mini ImageNet (a ten class subset of the original ImageNet (Russakovsky et al., 2015)) where we make interesting findings.

2 Results

We refer to our linear (no hidden layer) and non-linear (single hidden layer with 20 nodes) models as SCD01 and MLP01 respectively. See our Supplementary Material for their objectives, optimization algorithms, and runtime and accuracies on image classification benchmarks. As convex counterparts we select the linear support vector machine (with a cross-validated regularization parameter) denoted as SVM and a two layer 20 hidden node neural network with logistic loss (MLP). For multiclass we use one-vs-all for all four methods. We use the majority vote of 32 runs for our 01 loss models to improve stability and do the same for SVM and MLP by majority voting on 32 bootstrapped samples. Our implementations and experimental platforms are given in detail in the Supplementary Material. Our SCD01 and MLP01 source codes, supplementary programs, and data are available from

We refer to the accuracy on the test data as clean data test accuracy. An incorrectly classified adversarial example is considered a successful attack whereas a correctly classified adversarial is a failed one. Thus when we refer to accuracy of adversarial examples it is the same as

. The lower the accuracy the more effective the attack.

2.1 White box attacks

In this section we study white box attacks just for binary classification on classes 0 and 1 in each of the three datasets. We use single runs of each of the four models to generate adversaries using the model parameters. We use the same white box attack method (Papernot et al., 2016a) for SVM and SCD01 since both are linear classifiers: for a given datapoint and its label the adversary is where and is the distortion.

For MLP we use the fast gradient sign method (FGSM) (Goodfellow et al., 2014). In this method we generate an adversary using the sign of the model gradient where is the model objective and is the model gradient with respect to the data

. We generate white box adversaries for MLP01 with a simple heuristic: for each hidden node

we modify the input as (where is the output of from the hidden node ) and accept the first modification that misclassifies in the final node output. If is already misclassified or if none of the hidden node distortions misclassify it we distort with a randomly selected hidden node. We provide the full algorithm in the Supplementary Material. We use values on MNIST, CIFAR10, and ImageNet that are typical in the literature.

In Table 1 we see that the clean accuracies of our 01 loss models are comparable to the convex counterparts (with more shown in the Supplementary Material). As expected adversaries from the source on the same target are effective except for MLP01. More interestingly, while adversaries from SVM and MLP affect each other considerably they are far less pronounced on SCD01 and MLP01. We see this very clearly on CIFAR10 where both SVM and MLP adversaries have almost 0% accuracy when attacking each other indicating high transferability (Papernot et al., 2016a). But SVM and MLP adversaries on SCD01 and MLP01 have a far less effect in this dataset. Adversaries from MLP attain a 63.7% accuracy on MLP01 and 43.5% on SCD01. Another interesting observation is that adversaries barely transfer between SCD01 and MLP01. We see similar behavior on Mini ImageNet and to a lesser degree on MNIST.

Clean 100 99.9 100 100
SVM 11.9 8.1 40.4 43.5
SCD01 97 0 98.5 53.2
MLP 25.5 16.1 31 42.3
MLP01 99.9 99.8 99.6 69.5
Clean 82.2 81.1 88.7 84.2
SVM 0 41.3 0.5 70.1
SCD01 76 0.8 86 84.5
MLP 0 43.5 0.4 63.7
MLP01 81.7 80 88.5 66.9
Mini ImageNet
Clean 60.7 67.5 66.1 68.7
SVM 0 54.9 21.2 53.8
SCD01 58.6 1 65 60.3
MLP 0.5 42 21.6 52.3
MLP01 60.8 65.1 65.8 35.7
Table 1: Accuracy of adversaries made by the source model shown in the first column targeting models shown in the top row. We consider only binary classification here between classes 0 and 1.

We argue that the difference of loss functions (01 vs convex) may be responsible for different boundaries and non-transferability. We illustrate this in two examples. First we see the effect of outliers on 01 loss and hinge loss linear classifiers. Recall that the hinge loss is

where is the label and is the prediction of given by the classifier . In Figure 1

(a) the misclassified outlier forces the hinge loss to give a skewed linear boundary with two misclassifications. This happens because even though the two points are misclassified by the red boundary they are closer to it than the single misclassified one is to the blue one. The 01 loss is unaffected by distances and thus gives the blue boundary with one misclassification. Since the two boundaries have different orientations their adversaries are also likely to be different. In a dataset like MNIST where our accuracies are high we don’t expect many misclassified outliers and thus boundaries are unlikely to be different. As a result we see that many adversaries transfer between SVM and SCD01 on MNIST. But on CIFAR10 and Mini ImageNet, which are more complex and likely to contain misclassified outliers, we expect different boundaries which in turn gives fewer adversaries that transfer between the two.

Next we see the difference of convex and 01 loss in simple two hidden node network. In Figure 1

(b) we see two hyperplanes

and on the left whose logistic outputs give the hidden feature space on the right. The two hyperplanes and represent two hidden nodes in a two layer network. Recall that the logistic activation (where is prediction of given by ) is similar to 01 loss: for large values of it approaches 0 or 1 depending upon the sign of and approaches as approaches 0. Thus if we move the red circle towards the ”corner” in the original feature space (as shown in Figure 1(b)) its outputs from and approach in the hidden space. Consequently it crosses the linear boundary in the hidden space and becomes adversarial. However if the activation is 01 loss the red point remains unmoved in the hidden space. In fact in 01 loss a datapoint’s value in the hidden space changes only if we cross a boundary in the original space.

While both examples are not formal proofs they give some intuition of why fewer adversarial examples transfer between 01 loss and convex loss compared to between just convex. In particular we see that for 01 loss a datapoint becomes potentially adversarial if and only if it crosses a boundary in the original feature space whereas this is not true for convex losses.

(a) Just one point is misclassified by the blue boundary but its
hinge loss shown with dotted lines is much
higher than the loss of points and that are misclassified by
the red boundary. Thus the hinge loss favors the red skewed line.
(b) The logistic activation in the original space gives
a linear separation in the hidden space. If we move the red circle
towards the ”corner” of the boundaries its distance to and
decreases. This in turn makes its activation values approach half
and it becomes misclassified in the hidden space. If the activation
is 01 loss the red circle does not get affected in the hidden space.
Figure 1:

Toy example showing different 01 loss and hinge boundaries, and adversarial examples in simple logistic loss network

Interestingly we also see MLP01 adversaries don’t transfer to the other three models. When applied to MLP01 the adversaries lower its accuracy relative to clean data but to lesser degree than other models attacking themselves. Thus our white box attack method for MLP01 may not be the most powerful one leaving this an open problem.

2.2 Substitute model black box attacks

We see that white box adversaries don’t transfer between convex and 01 loss but can we attack a 01 loss model with a convex substitute model (Papernot et al., 2016a)? In this subsection we consider binary and multiclass classification on all three datasets. For all four methods we use 32 votes and one-vs-all multiclass classification. We use adversarial data augmentation (Papernot et al., 2016a)

to iteratively train a substitute model trained on label outputs from the target model. In each epoch we generate white box adversaries targeting the substitute model with the FGSM method

(Goodfellow et al., 2014) and evaluate them on the target. Note that our black box attack is untargeted, we are mainly interested in misclassifying the data and not the misclassification label. See Supplementary Material for the full substitute model learning algorithms but it is essentially the method of Papernot et. al. (Papernot et al., 2016a).

2.2.1 Convex substitute model

(b) CIFAR10 binary (class 0 and 1)
(b) CIFAR10
(c) Mini ImageNet
Figure 2: Multiclass untargeted black box attack with a dual 200 node hidden layer logistic loss network as the substitute model. In epoch 0 are the clean test accuracies.

In Figure 2 we see the accuracy of target models on adversaries generated from a convex substitute model. Specifically we use a dual hidden layer neural network with logistic loss and 200 nodes in each hidden layer as the substitute model. Like in the white box attacks we use values commonly used on these datasets. In MNIST (Figure 2(a)) we see a rapid drop in accuracy in the first few epochs and somewhat flat after epoch 10. We don’t see a considerable difference between the 01 loss and convex sibling models on MNIST although MLP01 has the highest accuracy.

On CIFAR10 and Mini ImageNet we see much more pronounced differences. In CIFAR10 binary classification (Figure 2(b)) we see that even though both MLP and MLP01 start off with clean test accuracies of 88% and 86% respectively, at the end of the 20th epoch MLP01 has 58% accuracy on adversarial examples while MLP has 7% accuracy. We see similar results on Mini ImageNet binary classification in the Supplementary Material. In CIFAR10 multiclass (Figure 2(c)) at the end of the 20th epoch the difference in accuracy between MLP and MLP01 is 24% even though both methods start off with about the same accuracy on clean test data. Similarly on Mini ImageNet MLP01 is 20.7% higher in accuracy than MLP in the 20th epoch. This is particularly interesting since MLP01 started off with a higher accuracy on Mini ImageNet and in general we expect more accurate models to be less robust (Raghunathan et al., 2019; Zhang et al., 2019; Tsipras et al., 2018). However that is not the case here. Even if we give MLP the advantage of 400 hidden nodes in a shared weight network instead of one-vs-all, its accuracy in the 20th epoch is 13% lower than MLP01.

We have already seen earlier in white box attacks that adversaries transfer between SVM and SCD01 on MNIST but not so much on CIFAR10 and Mini ImageNet. The same phenomena can be used to explain the results we see here. On MNIST the convex substitute model can attack SCD01 and MLP01 as effectively as convex models due to better transferability on MNIST. Due to poor transferability on CIFAR10 and Mini ImageNet we see that the attack is less effective on SCD01 and MLP01. In the next subsection we explore what happens if the substitute model is SCD01.

2.2.2 01 loss substitute model

Figure 3: We use SCD01 single run as the substitute model to attack single runs of the target models between only classes 0 and 1 in CIFAR10. In epoch 0 are the clean test accuracies.

In Figure 3 we see the results of a black box attack with SCD01 single run as the substitute model attacking single runs of target models. We see that adversaries produced from this model hardly affect any of the target models in any of the epochs. Even when the target is SCD01 and trained with the same initial seed as the substitute the adversaries are ineffective.

Further investigation reveals that the percentage of test data whose labels match between the 01 loss substitute and its target (known as the label match rate) is high but the label match rate on adversarial examples is much lower (shown in Supplementary Material). Thus even though the SCD01 manages to approximate the target boundary its direction is different which gives ineffective adversaries. This is due to the non-uniqueness of 01 loss which makes single run solutions different from each other. Thus as a substitute model in black box attacks 01 loss is ineffective even in attacking itself.

3 Conclusion

There is nothing to indicate that 01 loss models are robust to black box attacks that do not require substitute model training (Brendel et al., 2017; Chen et al., 2019). These are, however, computationally more expensive and require separate computations for each example. A transfer based model can be more effective (and dangerous) once it has approximated the target model boundary.

Can we further decrease transferability by introducing artificial noise so that 01 loss and convex boundaries are even more different, particularly on datasets like MNIST? We explore this in a separate study.


4 Supplementary Material

4.1 Background

The problem of determining the hyperplane with minimum number of misclassifications in a binary classification problem is known to be NP-hard (Ben-David et al., 2003). In mainstream machine learning literature this is called minimizing the 01 loss (Shai et al., 2011) given in Objective 1,


where , is our hyperplane, and

are our training data. Popular linear classifiers such as the linear support vector machine, perceptron, and logistic regression

(Alpaydin, 2004) can be considered as convex approximations to this problem that yield fast gradient descent solutions (Bartlett et al., 2004). However, they are also more sensitive to outliers than the 01 loss (Bartlett et al., 2004; Nguyen & Sanner, 2013; Xie et al., 2019) and more prone to mislabeled data than 01 loss (Manwani & Sastry, 2013; Ghosh et al., 2015; Lyu & Tsang, 2019).

4.2 A two layer 01 loss neural network

We extend the 01 loss to a simple two layer neural network with hidden nodes and sign activation that we call the MLP01 loss. This objective for binary classification can be given as


where , are the hidden layer parameters, are the final layer node parameters, are our training data, and . While this is a straightforward model to define optimizing it is a different story altogether. Optimizing even a single node is NP-hard which makes optimizing this network much harder.

4.3 Stochastic coordinate descent for 01 loss

We solve both problems with stochastic coordinate descent based upon earlier work (Xie et al., 2019)

. We initialize all parameters to random values from the Normal distribution with mean 0 and variance 1. We then randomly select a subset of the training data (known as a batch) and perform the coordinate descent analog of a single step gradient update in stochastic gradient descent

(Bottou, 2010). We first describe this for a linear 01 loss classifier which we obtain if we set the number of hidden nodes to zero. In this case the parameters to optimize are the final weight vector and the threshold .

When the gradient is known we step in its negative direction by a factor of the learning rate: where is the objective. In our case since the gradient does not exist we randomly select features (set to 64, 128, and 256 for MNIST, CIFAR10, and ImageNet in our experiments), modify the corresponding entries in by the learning rate (set to 0.17) one at a time, and accept the modification that gives the largest decrease in the objective. Key to our search is a heuristic to determine the optimal threshold each time we modify an entry of . In this heuristic we perform a linear search on a subset of the projection and select that minimizes the objective.

Figure S1: Train and test accuracy of our stochastic coordinate descent on CIFAR10 class 0 vs 1 with different batch sizes (denoted as nrows).

We repeat the above update step on randomly selected batches for a specified number of iterations given by the user. In Figure S1 we show the effect of the batch size (as a percentage of each class to ensure fair sampling) on a linear 01 loss search on CIFAR10 between classes 0 and 1. We see that a batch size of 75% reaches a train accuracy of 80% faster than the other batch sizes. Thus we use this batch size in all our experiments going forward.

We also see that for this batch size the search flattens after 15 iterations (or epochs as given in the figure). We run 1000 iterations to ensure a deep search with an intent to maximize test accuracy. For imbalanced data (that appears in the one-vs-all design) we find that optimizing a balanced version of our objective for half the iterations followed by the default (imbalanced) version gives a lower objective in the end.

In a two layer network we have to optimize our hidden nodes as well. In each of the 1000 iterations of our search we apply the same coordinate update described above, first to the final output node and then a randomly selected hidden node. In preliminary experiments we find this to be fast and almost as effective as optimizing all hidden nodes and the final node in each iteration.

Our intuition is that by searching on just the sampled data we avoid local minima and across several iterations we can explore a broad portion of the search space. Throughout iterations we keep track of the best parameters that minimize our objective on the full dataset. Below we provide full details of our algorithms.

The problem with our search described above is that it will return different solutions depending upon the initial starting point. To make it more stable we run it 32 times from different random seeds and use the majority vote for prediction.

We extend both our linear and non-linear models to a simple one-vs-all approach for multiclass classification. For a dataset with classes we create one-vs-all classifiers for each of the classes. From the 32 models we can obtain frequency outputs for a test point using simple counting and use them as confidence scores for each class

. From this we output the predicted class as the one with the highest confidence. This is similar in spirit to the typical convex softmax objective used in convex neural networks except that there we can optimize to obtain the exact confidences given by sigmoid probabilities.

4.4 Implementation, experimental platform, and image benchmarks

We implement our 01 loss models in Python and Pytorch (Paszke et al., 2019), and both MLP and SVM (LinearSVC class) in scikit-learn (Pedregosa et al., 2011). We optimize MLP with stochastic gradient descent that has a batch size of 200, momentum of 0.9, and learning rate of 0.01 (.001 for ImageNet data). We ran all experiments on Intel Xeon 6142 2.6GHz CPUs and NVIDIA Titan RTX GPU machines (for parallelizing multiple votes). Our SCD01 and MLP01 source codes, supplementary programs, and data are available from

We experiment on three popular image benchmarks: MNIST (LeCun et al., 1998), CIFAR10 (Krizhevsky, 2009), and ImageNet (Russakovsky et al., 2015). Briefly MNIST is composed of grayscale handwritten digits each of size with 60000 training images and 10000 test and CIFAR10 has color images with 50000 training and 10000 test. ImageNet is a large benchmark with 1000 classes and color images of size . We extract images from10 random classes and split them to give a training set of 6144 images and test set of 6369. We normalize each image in each benchmark by dividing each pixel value by 255.

4.5 Clean accuracy and runtimes

Before going into robustness we first compare the clean test data accuracies and training runtimes of our 01 loss models to their convex counterparts. In Table S1 we see that ensembling SVM and MLP models does not improve the test accuracy over single runs, thus we use a shared weight MLP network with 400 nodes on ImageNet to boost accuracy there. In fact the SVM boundary depends only upon the support vectors and so each ensemble will be the same as long as the support vectors are included. As a reminder we ensemble by taking the majority vote on multiple bootstrapped samples.

The 01 loss models improve considerably in all three datasets by ensembling. This is not too surprising since 01 loss is non-unique and will give different solutions when ran multiple times from different initializations. As a result of ensembling their accuracy is comparable to their convex peers. This makes it easier to compare their robustness since we don’t have to worry about the robustness vs accuracy tradeoff (Raghunathan et al., 2019; Zhang et al., 2019; Tsipras et al., 2018).

Single run
MNIST 91.7 83.7 97.6 91.2
CIFAR10 39.9 30.7 50.2 34.3
Mini ImageNet 26 25 32 25.5
32 votes
MNIST 91.7 90.8 97.1 96
CIFAR10 40.2 39.7 47.4 46.4
MLP400 SCD01 MLP01
single run
Mini ImageNet 36 34.7 41
Table S1: Accuracy of our 01 loss and convex counterparts on clean test data

In Table S2 we show the runtime of a single run of our 01 loss and convex models on class 0 vs all for each of the three datasets. We don’t claim the most optimized implementation but our runtimes are still somewhat comparable to the convex loss models. Interestingly the convex models take much longer on complex and higher dimensional images in ImageNet compared to MNIST. Our 01 loss model runtimes are similar on MNIST and CIFAR10 because their sizes are similar. On Mini ImageNet since it has fewer training samples than MNIST and CIFAR10 the 01 loss runtimes are also lower.

MNIST 0.8 171 64 875
CIFAR10 80 150 267 838
Mini ImageNet 659 83 8564 199
Table S2: Runtimes in seconds of single runs of our 01 loss and convex counterparts on class 0 vs all. On Mini ImageNet we show runtimes for MLP with 400 nodes in its hidden layer since 20 nodes has much lower accuracy.

4.6 Label match rates between SCD01 substitute model and target models in black box attack

(a) Percentage of labels that are the same between
the substitute model and target model on clean data
(b) Percentage of labels that are the same between
the substitute model and target model on adversarial data
Figure S2: In (a) we see that SCD01 as a substitute model can approximate the target boundary as shown in the label match rate between SCD01 and the target. But when we use the SCD01 to generate adversaries the match rate is much lower which indicates that the direction of the SCD01 boundary is very different from the targets and thus its adversaries have very little effect on the target.

4.7 Black box adversarial attacks on class 0 and 1 on Mini-ImageNet with convex substitute model

(a) Black box attack on classes 0 and 1 on Mini-ImageNet
with convex substitute model and distortion
Figure S3: Our 01 loss model are robust to convex substitute black box attacks also in binary classification. Here we see that the accuracies on clean test data are higher than multiclass classification and yet our models are still robust.

4.8 Coordinate descent

Input: Data (feature vectors) for with labels , , size of pooled features to update , vector and
Output: Vector and

  1. Initialization: If is null then let each feature of be normally drawn from . We set and throughout our search ensure that by renormalizing each time changes.
  2. Let the number of misclassified points with negative be and those with positive be . These are later used in the Optimal Threshold algorithm called Opt (see below) for fast update of our objective.
  3. Compute the initial data projection , sort the projection with insertion sort, and initialize . We also record the value of for the optimal .
  4. Set , .
  while done != 1 do
     Randomly pick of the feature indices.
     for all selected features we update them do
        1. Assume the optimal
        2. Set and
        3. Modify coordinate by , compute data projection , and sort the projection with insertion sort
        4. Set and record this value for feature
        5. Reset to try the next coordinate
     end for
     Pick the coordinate whose update gives the largest decrease in the objective and set to the values given by the best coordinate with ties decided randomly.
  end while
Algorithm 1 Coordinate descent

This is our core coordinate descent algorithm. We perform just one iterative update instead of convergence. We find this to be more accurate and faster.

4.9 Optimal threshold and 01 loss objective value

Input: for with labels , ,
Output: Optimal with minimum (balanced) 01 loss and the loss value

1:  for  to  do
2:      =
3:     if  then
4:        If then errorplus++
5:     else if  then
6:        If then errorplus else errorminus
7:     else if  then
8:        If then errorplus++ else errorminus++
9:     end if
10:     If is lower than current best objective then and .
11:  end for
12:  return ()
Algorithm 2 Opt

This is our fast algorithm to update and the model objective. Once we have the objective for we can calculate it for in constant time.

4.10 Stochastic coordinate descent for linear 01 loss

Input: Data (feature vectors) with labels , number of votes (Natural numbers), number of iterations per votet (Natural numbers), batch size as a percent of training data , and
Output: Total of pairs of (, ) after each vote

  while  do
     1. Set
     for  to  do
        1. Randomly pick percent of rows as input training data to the coordinate descent algorithm and run it to completion starting with the values of and from the previous call to it (if we set ).
        2. In the next step we calculate the linear 01 loss objective on the full input training set
        if  then
           Set , , and
        end if
     end for
     2. Output and
     3. Set .
  end while
  We output all pairs across the votes. We can use the pair with the lowest objective or the majority vote of all pairs for prediction.
Algorithm 3 Stochastic coordinate descent for linear 01 loss

Our stochastic descent search performs coordinate descent for the model parameters . We keep track of the best parameters across iterations by evaluating the model objective on the full dataset after each iteration.

4.11 Stochastic coordinate descent for two layer 01 loss network

Input: Data (feature vectors) with labels , number of hidden nodes , number of votes (Natural numbers), number of iterations per vote , batch size as a percent of training data , and
Output: Total of sets of after each vote

  1. Initialize all network weights to random values from the Normal distribution .
  2. Set network thresholds to the median projection value on their corresponding weight vectors and to the projection value that minimizes our network objective.
  while  do
     for  to  do
        Randomly pick percent of rows as input training data.
        Run the Coordinate Descent Algorithm 1 on the final output node to completion starting with the values of and from the previous call to it (if we set ). We use learning rate in the coordinate descent.
        Run the Coordinate Descent Algorithm 1 on a randomly selected hidden node ( column in ) starting with the values of and ( entry in ) from the previous call to it (if we set ). We use learning rate in the coordinate descent for the hidden nodes.
        Calculate the two layer network 01 loss objective on the full input training set
        if  then
           Set , , , , and
        end if
     end for
     Output (, , , )
     Set .
  end while
  We output all sets of across the votes. We can use the first set or the majority vote of all sets for predictions.
Algorithm 4 Stochastic coordinate descent for two layer 01 loss network

Our stochastic descent search performs coordinate descent on the final node and then a random hidden node in each iteration. We keep track of the best parameters across iterations by evaluating the model objective on the full dataset after each iteration.

4.12 White box adversarial attacks

Input: MLP01 model vector weights , feature vector and label
Output: Adversarial feature vector

  for each hidden node (each row of in a random order) do
     Evaluate output of from the hidden node as
     Make adversarial w.r.t. the boundary with .
     Evaluate model output of as
     if y not equal to y” then
        Accept adversarial example and exit loop
     end if
  end for
  if no adversarial example found then
     Evaluate output of from the first hidden node as
  end if
Algorithm 5 White box adversaries for MLP01

If the datapoint is already misclassified by our model our attack simply performs the perturbation given by a random hidden node (since the ordering is chosen randomly). Otherwise it picks the distortion of the first random node that makes it misclassified. If no distortion misclassifies the point it distorts the datapoint by the first hidden node in the random ordering.

4.13 Black box adversarial attacks

Input: Model to be attacked, adversarial attacker , and that determine amount of adversarial perturbation in each sample where is used in training the substitute model and is to generate adversaries to attack the target model, dataset with labels (for 10 classes we have ), number of epochs (Natural numbers)

  Set the initial data as 200 random samples from the input dataset.
  for  to  do
     1. Obtain predictions of from black box model
     3. Train attacker with as input training data
     4. With ’s gradient we produce adversarial examples as augmented data to train the substitute with the step below.
     5. For each sample in create adversary where is the gradient of with respect to the data and is given in the input. We randomly decide to add or subtract by a coin flip and found this trick to improve the substitute model accuracy on the input data and produce more effective adversarial examples.
     6. We have the optional step of generating adversaries with the trained substitute model and evaluating their accuracy on the target. In this way we can see the adversarial accuracy of the target models across epochs as we train the substitute. We use the same method described below: generate adversaries on the input dataset minus the 200 samples with the formula and set to the 0.0625 for CIFAR10 and ImageNet and 0.3 for MNIST.
     7. Add new adversarial samples to . This doubles the number of adversarial samples after each iteration until we reach 6400. After this we just replace the adversarial examples from the previous epoch with the new one.
  end for
  Now that our attacked is trained we produced adversaries for the remaining datapoints. For each datapoint in the dataset minus the 200 selected initially to train the substitute we produce adversaries using as in step 5 above but now we use instead of . We now test the accuracy of the target model with the newly generated adversaries. Note that this is an untargeted attacked. We just want the datapoint to be misclassified by the model, we don’t care which class it is misclassified into.
Algorithm 6 Substitute model training with augmented adversaries

In the above procedure we use the test data as the input when attacking a model on a benchmark. We set for MNIST and CIFAR10 and for ImageNet since these values produce the most effective attack. We use values on MNIST, CIFAR10, and ImageNet that are typical in the literature. For MNIST corresponds to a change of in each pixel and for CIFAR10 and ImageNet corresponds to a change of in each pixel.

When our substitute model is the dual layer network each with 200 hidden nodes we train it with stochastic gradient descent, batch size of 200, learning rate of 0.01, and momentum of 0.9. When it is SCD01 we run 1000 iterations with batch size (nrows) of 75%.