Adversarial attacks present a challenge to machine learning algorithms typically based on convex losses. State of the art classifiers like the support vector machineCortes & Vapnik (1995) and neural networks Krizhevsky et al. (2012) achieve high accuracies on test data but are also vulnerable to adversarial attacks based on minor perturbations in the data Goodfellow et al. (2014); Papernot et al. (2016b); Kurakin et al. (2016); Carlini & Wagner (2017); Brendel et al. (2017). To counter adversarial attacks many defense methods been proposed with adversarial training being the most popular Szegedy et al. (2013). This is known to improve model robustness but also tends to lower accuracy on clean test data that has no perturbations Raghunathan et al. (2019); Zhang et al. (2019); Raghunathan et al. (2019).
The robustness of outliers to the 01 loss is well known Bartlett et al. (2004). Convex loss functions such as least squares are affected by both correct and incorrectly classified outliers and hinge is affected by incorrectly classified outliers whereas the 01 loss is robust to both Xie et al. (2019); Nguyen & Sanner (2013). In addition to being robust to outliers the 01 loss is also robust to noise in the training data Manwani & Sastry (2013); Ghosh et al. (2015) and under this loss minimizing the empirical risk amounts to minimizing the empirical adversarial risk Lyu & Tsang (2019); Hu et al. (2016) with certain assumptions of noise. We conjecture that these properties may translate to robustness against black box adversarial attacks that typically succeed in fooling state of the art classifiers Papernot et al. (2017).
To test this we first develop stochastic coordinate descent solvers for 01 loss based upon prior work Xie et al. (2019)
. We also extend the previous work to a non-linear single hidden layer 01 loss network that we call MLP01. For the task of binary classification on standard image recognition benchmarks we show that our linear 01 loss solver and the MLP01 loss are both as accurate as their convex counterparts, namely the linear support vector machine and the logistic loss single hidden layer network. We then subject all methods to a substitute model black box attackPapernot et al. (2017) and find both our 01 loss models (linear and non-linear) to be more robust than hinge and logistic.
We find that on separable image datasets like MNIST our model offers little advantage and demonstrate under simulation why this happens. We then conduct adversarial training of our linear model and show it increases its robustness on MNIST and all other datasets while retaining clean test accuracy. We also show applications to deter street sign and facial recognition adversarial attacks. We describe below our methods followed by results and discussion.
The problem of determining the hyperplane with minimum number of misclassifications in a binary classification problem is known to be NP-hardBen-David et al. (2003). In mainstream machine learning literature this is called minimizing the 01 loss Shai et al. (2011) as given in Objective 1,
where , is our hyperplane solution, and2004) can be considered as convex approximations to this problem that yield fast gradient descent solutions Bartlett et al. (2004). However, they are also more sensitive to outliers than the 01 loss Bartlett et al. (2004); Nguyen & Sanner (2013); Xie et al. (2019).
We extend the 01 loss to a simple single hidden layer neural network with hidden nodes and sign activation that we call the MLP01 loss. This objective can be given as
where , are the hidden layer parameters, are the final layer node parameters, are our training data, and .
We solve both problems with stochastic coordinate descent based upon earlier work Xie et al. (2019).
Other work on 01 loss solvers
Aside from the stochastic coordinate descent Xie et al. (2019) that we build upon other attempts have been made to optimize the 0/1 loss. These include boosting Zhai et al. (2013), integer programming Tang et al. (2014), an approximation algorithm Shalev-Shwartz et al. (2011), a random coordinate descent method Li & Lin (2007), and a branch and bound method that is the most recent from 2013 Nguyen & Sanner (2013). The above previous works cover various strategies to solve 01 loss but lack on-par test accuracy with convex solvers on real data. We obtained a Matlab implementation of the branch and bound method and found it to be slow - it did not finish after several hours of runtime, as also cautioned by authors in their code. The random coordinate descent code Li & Lin (2007) requires GNU C compiler (gcc) version 3.0 to compile whereas current supported versions are above 4.0.
2.2 Stochastic coordinate descent (SCD01)
We use the stochastic coordinate descent for 01 loss Xie et al. (2019) called SCD01 to drive our linear and non-linear 01 loss solvers. In the Supplementary Material we fully describe the algorithm for reference including the optimal threshold algorithm. Briefly, we iteratively randomly select a subset of the training data in each epoch and run coordinate descent in each iteration. Our coordinate descent shown in Algorithm 1 differs from previous work in how we update the coordinates. In previous work Xie et al. (2019) authors update each coordinate until there is no change in the objective. We randomly update a pool of coordinates by one step and pick the one with the greatest decrease in the objective (with ties decided randomly).
2.3 Single hidden layer 01 loss network (MLP01)
We extend the stochastic coordinate descent solver to a single hidden layer network with hidden nodes that we call MLP01 (see Algorithm 2). For each random batch of the training data we train the final node followed by each hidden node using our Coordinate Descent algorithm above (Algorithm 1). We set 20 hidden nodes () in our experiments.
2.4 Majority vote 01 loss
Due to the non-uniqueness of 01 loss and randomness of our solvers both our methods will return different solutions when initialized with different seeds for the random number generator. Thus we take the majority vote of multiple runs which we see as inherently necessary due to the nature of 01 loss. We call our methods SCD01majvote that has 100 votes and MLPmajvote that has 32 votes (input in Algorithm 2).
2.5 Adversarial training SCD01
We apply the basic iterative adversarial training described earlier Kurakin et al. (2016)
to our SCD01 algorithm. The adversarial training objective is actually a min-max objective: we minimize the empirical risk across the maximum distortion of the input data. The iterative training that has been proposed earlier and used by us below is a heuristic to the min-max problem.
2.6 Black box adversarial attack
Since the 01 loss model has no gradient we cannot use white box gradient based attacks. Instead we resort to a black box strategy that uses the gradient of a substitute model to generate adversaries Papernot et al. (2017). In this setting we start with a small subset of the test data (200 samples) from which we iteratively learn the substitute model parameters. In each iteration (epoch) we generate adversaries from the remaining test and attack the target model. This strategy can be effective as long as the substitute model is at least as accurate on the test data as the target model. This indicates that the substitute model is accurately modeling the target model at least on the samples it is trained upon and so its gradient is likely to produce effective adversaries.
For the substitute model we use a two hidden layer neural network each with 200 nodes per layer as in previous work Galloway et al. (2017)
. We implement this using the multilayer perceptron class in the Python scikit-learn toolkitPedregosa et al. (2011). In the Supplementary Material we fully describe our black box attack model. We set the distortion in the adversarial images to for CIFAR10, STL10, ImageNet and for MNIST. In the Supplementary Material we provide results for lower values of and see similar results as here.
We study all methods for the task of binary classification of classes 0 and 1 on four popular image classification benchmarks. We compare our 01 loss methods to their convex counterparts on each benchmark. We run the LinearSVC algorithm (SVM) in Python scikit-learn toolkit Pedregosa et al. (2011) with cross-validated
. We use the Python scikit-learn multilayer perceptron with logistic loss (MLP) and one hidden layer of 20 the same as the MLP01 network. We train our MLP with stochastic gradient descent and learning parameters set to optimize the cross-validation accuracy. We also study a majority vote SVM and MLP by running them a 100 times on bootstrapped samples (bagging) and find no improvement in robustness compared to the single runs. We obtained the previous stochastic coordinate descent 01 loss solverXie et al. (2019) that we compare to our new one. All of our code and data are freely available from our GitHub site https://github.com/zero-one-loss/01loss.
3.1 CIFAR10, STL10, and ImageNet
We start with results for binary classification of classes 0 and 1 in CIFAR10, STL10, and ImageNet. Between classes 0 and 1 we have in CIFAR10 Krizhevsky (2009) we have 10,000 training images and 2000 test ones and in STL10 Coates et al. (2011) we have 1000 training images and 1600 test. In ImageNet classes 0 and 1 contain about 2580 training images and 100 test ones. We change the split so as to increase the test data size so that we can better train the black box attack substitute model. We divide the training set into two parts: the first containing 1280 for training and 1300 for test.
In Figure 1 we see that both our linear SCD01 and non-linear MLP01 models have comparable accuracy to the linear SVM and non-linear MLP but are much more robust. In Figure 2 we show clean and adversarial images generated by attacking MLP01, MLP, and SVM. We show adversarial images that are correctly classified by MLP01 and wrongly by MLP and SVM.
3.2 MNIST and simulation
While the above benchmarks are focused on image classification of arbitrary objects the MNIST benchmark focuses on digit classification and is easier in comparison. Its test accuracy is typically above 99% for most classifiers. Between classes 0 and 1 (also digits 0 and 1) we have 12,665 training images and 2115 test images each of size . In Figure 3 we see that SCD01 is not as robust as SVM and MLP01 is the same as MLP. We conjecture that this may be due to non-uniqueness of 01 loss on easily separable classes like we see in MNIST. To understand this better we turn to simulated data.
In Figure 4 we show SCD01 and SVM boundaries on simple and complex simulated datasets. On the simple dataset (shown in Figure 4(a) and (b)) we see that the SCD01 boundary is close to one class whereas the SVM is centered to maximize the margin. This is due to the non-uniqueness of the 01 loss function. There are infinite solutions on the simple dataset and the search ends as soon as the loss value becomes zero. On the complex dataset however both SCD01 and SVM boundaries are similar.
|(a) SCD01 simple||(b) SVM simple|
|(c) SCD01 complex||(d) SVM complex|
We conjecture that attacking SCD01 on a simple dataset would be easy because a convex substitute model (such as MLP and SVM) would have a boundary similar to the SVM. Thus adversaries from its boundary are likely to succeed in attacking SCD01 whose boundary lies close to one class. Indeed we see in Figure 4(e) that SCD01 falls from 100% to 50% accuracy on the simple dataset after the first epoch of the black box attack whereas SVM remains at 100%. We now consider adversarial training to improve SCD01’s robustness particularly on simple MNIST type datasets.
3.3 Adversarial training
In order to increase the robustness of SCD01 we run Algorithm 3 starting with a single run SCD01 trained on the full training dataset. We then take the majority vote of all 100 classifiers learnt in each iteration. We also run the same algorithm by replacing SCD01 with SVM and instead of voting across all 100 we use just the final model as is typical in adversarial training Kurakin et al. (2016). We find the adversarially trained SCD01 is more robust than SCD01 and SCD01 majvote on all datasets while retaining clean test accuracy. In Figure 5 we see the adversarially trained SCD01 on MNIST and CIFAR10 outperforms the versions trained on the clean data. It is comparable to the adversarially trained SVM on MNIST and better on MNIST and CIFAR10.
3.4 Transferability of adversaries between 01 and convex loss
|Adversarial||Black box target model|
|accuracy of model||MLP01 (32 votes)||MLP||SVM|
|MLP01 (32 vo.)||59.7%||55.8%||60.8%|
|Adversarial||Black box target model|
|accuracy of model||MLP01 (32 votes)||MLP||SVM|
|MLP01 (32 vo.)||60.5%||58.5%||64.93%|
Adversarial samples are known to transfer between classifiers Papernot et al. (2016a). We find this is not so true for 01 loss adversaries. In Table 1 we see that adversaries targeting MLP01 also attack MLP and SVM but to a lower extent than if we attacked MLP and SVM directly as the target model. Adversaries produced by attacking MLP and SVM transfer between each other but not to MLP01.
3.5 Runtimes, stability, and comparison to prior work
In Table 2 we see that our solver with majority vote is considerably faster than the previous one Xie et al. (2019) (100 votes) but still slower than the convex counterparts. These are measured on Intel Xeon Silver 4114 CPUs with NVIDIA Titan RTX 2080 GPUs.
|100 votes||100 votes||32 votes|
We compare the adversarial accuracy of the previous 01 loss solver to ours on CIFAR10 and STL10 and find the robustness to be similar (see Supplementary Material for graph). This further supports our hypothesis of 01 loss robustness over convex ones since we see a high robustness across two different 01 loss solvers. In Table 3 we see that both SCD01 and MLP01 majority vote on the test data have low deviation suggesting that our results are stable and reproducible.
|SCD01 (100 votes)||MLP01 (32 votes)|
Mean and standard deviation of 100 votes of SCD01 and 32 votes of MLP01
3.6 Applications: street sign and facial recognition adversarial attacks
We now turn to two practical problems where adversarial attacks pose a problem. First is the task of street sign detection by autonomous vehicles Sitawarin et al. (2018) and the second is facial recognition that are used by government and security systems. We consider 2816 train and 900 test images of street signs of 60 and 120 mph from the GTSRB street sign dataset Stallkamp et al. (2011) and 1000 train and 1000 test images of brown and black hair individuals from the CelebA facial recognition benchmark Liu et al. (2015). For GTSRB we use a perturbation of and for CelebA we use in the black box attack. We show other values of in the Supplementary Material and make similar observations as here. In Figure 6 we see that the MLP01 attains comparable accuracy to SVM and MLP but is more robust as we saw in earlier benchmarks. We show sample adversarial images in Figure 7.
Our results show that a convex substitute model (like the multilayer perceptron that we use) can generate effective adversaries for other convex ones like SVM and MLP but not so much on 01 loss like our SCD01 and MLP01. We ask two follow-up questions. (1) Can we attack the 01 loss with a 01 loss substitute model? (2) Was the multilayer perceptron substitute model in the black box attack correctly trained? To answer the first question we use SCD01 as the substitute model in our black box attack.
In Figure 8 we see the results of attacking SCD01 and SVM with SCD01 as the substitute model in the black box attack. We use the same seed for the random number generator in SCD01 both for the target and substitute model to avoid differences due to randomness. We don’t use the majority vote and attack a single run of SCD01.
We see that the black box attack with SCD01 as the substitute model fails to attack any of the models even though the accuracy of the substitute on test data is high during training and the distortion is set to a high value of . We argue this is because of its non-unique nature: there can be infinite solutions all yielding the same local minimum in the 01 loss search space. Thus when we attempt to learn an SCD01 single vote model to generate adversaries we find it cannot even approximate and successfully attack the same SCD01 trained on clean data with the same random number generator seed as the substitute.
Of course we can attack SCD01 if we know its model parameters . We simply generate adversaries with and these will fool the SCD01 single vote. In Table 4 we see that the SCD01 adversaries generated in this way fool the SCD01 classifier and also transfer over to the SVM to some degree. This would be a white box attack though. If the model parameters are kept hidden or retrained we see that both convex and 01 loss substitute models find it hard to attack 01 loss.
|SCD01 (white box)||.19 (.83)||.79 (.82)||.86 (.86)||.87 (.89)|
For the second question we look at the accuracy of our multilayer perceptron substitute model in each epoch of the black box attack. An accurately trained substitute model indicates that our training was successful and its gradient is likely to be an effective generator of adversaries. Indeed we see in Figure 9 that the black box substitute model accuracy while attacking SCD01, MLP01, SVM, and MLP on CIFAR10 are similar to the clean test accuracies of the target model suggesting we have a well-trained substitute. We see the same trend on all datasets.
We have shown that substitute model black box attacks are not so effective against 01 loss models when the substitute model is convex or 01 loss. There are however other black box attack methods that rely on just labels and try to estimate the minimum distortion of adversarial examples (an NP-hard problem)Chen et al. (2017); Brendel et al. (2017); Chen et al. (2019). We obtained the implementations of the Boundary Attack Brendel et al. (2017) and HopSkipJump attack Chen et al. (2019) to determine the minimum adversarial distortion of our SCD01 boundary. Both of these start with an adversarial example and make incremental changes until the example is just at the boundary of the target model. In our initial attempts we found both codes to crash before reaching convergence or their default maximum iterations when we attack our SCD01 and SCD01 majvote models. While we need to revisit the attacks both are slow even for a single example. This is not surprising since finding the minimum distortion is an NP-hard problem and thus hard to solve in practice.
We measure the distances between adversarial and clean images shown in Figure 2 plus the Celeba images in Figure 7 and average them: MLP01=11.54, MLP=11.5, SVM=11.5. Thus despite MLP01 adversaries have a higher distortion they are still correctly classified by MLP01.
We have not shown the effect of different parameters on SCD01 and MLP01 because our focus here is on adversarial attacks on 01 loss. We determined our parameters by optimizing accuracy on the test dataset and then fix them for the adversarial attacks. While our SCD01 and MLP01 are possibly the fastest 01 solvers that we know of, our runtimes are still higher than SVM and MLP. Thus speeding up our algorithms by parallelization is a key part of future work.
There are several other future avenues we could explore going forward. The first is multi-class classification so we can evaluate 01 loss on full image benchmarks. In previous work Xie et al. (2019)
a one-vs-one strategy of 10 votes showed promising but limited results. We instead plan to add more nodes to the final layer of MLP01 and rely on majority vote for classification. Even though our results here are for classes 0 and 1 we expect similar trends on other pairs of classes in the benchmarks. Besides a multi-class network we may explore 01 loss convolutions in an attempt to match the accuracy of convolutional neural networks. This is computationally significantly hard but perhaps possible by extending the stochastic coordinate descent like we have in this study.
We briefly touch upon adversarial training in this paper and plan to explore it separately. In particular we plan to study adversarially trained SVM and MLP, and explore iterative training for SCD01 more thoroughly. It is unclear how to generate white box adversaries for MLP01 and so a naive iterative training like we did for SCD01 will not work there. One strategy is to use gradient free black box attacks but runtime may be a problem. If we can successfully adversarially train MLP01 it may become more robust than what we demonstrate here.
- Alpaydin (2004) Alpaydin, E. Machine Learning. MIT Press, 2004.
- Bartlett et al. (2004) Bartlett, P. L., Jordan, M. I., and Mcauliffe, J. D. Large margin classifiers: Convex loss, low noise, and convergence rates. In Thrun, S., Saul, L., and Schölkopf, B. (eds.), Advances in Neural Information Processing Systems 16, pp. 1173–1180. MIT Press, 2004.
- Ben-David et al. (2003) Ben-David, S., Eiron, N., and Long, P. M. On the difficulty of approximately maximizing agreements. Journal of Computer and System Sciences, 66(3):496–514, 2003.
- Brendel et al. (2017) Brendel, W., Rauber, J., and Bethge, M. Decision-based adversarial attacks: Reliable attacks against black-box machine learning models. arXiv preprint arXiv:1712.04248, 2017.
- Carlini & Wagner (2017) Carlini, N. and Wagner, D. Towards evaluating the robustness of neural networks. In 2017 IEEE Symposium on Security and Privacy (SP), pp. 39–57. IEEE, 2017.
- Chen et al. (2019) Chen, J., Jordan, M. I., and Wainwright, M. J. Hopskipjumpattack: A query-efficient decision-based attack. arXiv preprint arXiv:1904.02144, 3, 2019.
Chen et al. (2017)
Chen, P.-Y., Zhang, H., Sharma, Y., Yi, J., and Hsieh, C.-J.
Zoo: Zeroth order optimization based black-box attacks to deep neural
networks without training substitute models.
Proceedings of the 10th ACM Workshop on Artificial Intelligence and Security, pp. 15–26, 2017.
- Coates et al. (2011) Coates, A., Ng, A., and Lee, H. An analysis of single-layer networks in unsupervised feature learning. In Proceedings of the fourteenth international conference on artificial intelligence and statistics, pp. 215–223, 2011.
- Cortes & Vapnik (1995) Cortes, C. and Vapnik, V. Support-vector networks. Machine learning, 20(3):273–297, 1995.
- Galloway et al. (2017) Galloway, A., Taylor, G. W., and Moussa, M. Attacking binarized neural networks. arXiv preprint arXiv:1711.00449, 2017.
- Ghosh et al. (2015) Ghosh, A., Manwani, N., and Sastry, P. Making risk minimization tolerant to label noise. Neurocomputing, 160:93–107, 2015.
- Goodfellow et al. (2014) Goodfellow, I. J., Shlens, J., and Szegedy, C. Explaining and harnessing adversarial examples. arXiv preprint arXiv:1412.6572, 2014.
- Hu et al. (2016) Hu, W., Niu, G., Sato, I., and Sugiyama, M. Does distributionally robust supervised learning give robust classifiers? arXiv preprint arXiv:1611.02041, 2016.
- Krizhevsky (2009) Krizhevsky, A. Learning multiple layers of features from tiny images. 2009.
- Krizhevsky et al. (2012) Krizhevsky, A., Sutskever, I., and Hinton, G. E. Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems, pp. 1097–1105, 2012.
- Kurakin et al. (2016) Kurakin, A., Goodfellow, I., and Bengio, S. Adversarial machine learning at scale. arXiv preprint arXiv:1611.01236, 2016.
- Li & Lin (2007) Li, L. and Lin, H.-T. Optimizing 0/1 loss for perceptrons by random coordinate descent. In Neural Networks, 2007. IJCNN 2007. International Joint Conference on, pp. 749–754. IEEE, 2007.
Liu et al. (2015)
Liu, Z., Luo, P., Wang, X., and Tang, X.
Deep learning face attributes in the wild.
Proceedings of International Conference on Computer Vision (ICCV), December 2015.
- Lyu & Tsang (2019) Lyu, Y. and Tsang, I. W. Curriculum loss: Robust learning and generalization against label corruption. arXiv preprint arXiv:1905.10045, 2019.
- Manwani & Sastry (2013) Manwani, N. and Sastry, P. Noise tolerance under risk minimization. IEEE transactions on cybernetics, 43(3):1146–1151, 2013.
- Nguyen & Sanner (2013) Nguyen, T. and Sanner, S. Algorithms for direct 0–1 loss optimization in binary classification. In Proceedings of The 30th International Conference on Machine Learning, pp. 1085–1093, 2013.
- Papernot et al. (2016a) Papernot, N., McDaniel, P., and Goodfellow, I. Transferability in machine learning: from phenomena to black-box attacks using adversarial samples. arXiv preprint arXiv:1605.07277, 2016a.
- Papernot et al. (2016b) Papernot, N., McDaniel, P., Jha, S., Fredrikson, M., Celik, Z. B., and Swami, A. The limitations of deep learning in adversarial settings. In 2016 IEEE European Symposium on Security and Privacy (EuroS&P), pp. 372–387. IEEE, 2016b.
- Papernot et al. (2017) Papernot, N., McDaniel, P., Goodfellow, I., Jha, S., Celik, Z. B., and Swami, A. Practical black-box attacks against machine learning. In Proceedings of the 2017 ACM on Asia Conference on Computer and Communications Security, pp. 506–519. ACM, 2017.
- Pedregosa et al. (2011) Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., and Duchesnay, E. Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12:2825–2830, 2011.
- Raghunathan et al. (2019) Raghunathan, A., Xie, S. M., Yang, F., Duchi, J. C., and Liang, P. Adversarial training can hurt generalization. arXiv preprint arXiv:1906.06032, 2019.
- Shai et al. (2011) Shai, S.-S., Shamir, O., and Sridharan, K. Learning linear and kernel predictors with the 0-1 loss function. IJCAI Proceedings-International Joint Conference on Artificial Intelligence, 22(3), 2011.
- Shalev-Shwartz et al. (2011) Shalev-Shwartz, S., Shamir, O., and Sridharan, K. Learning linear and kernel predictors with the 0-1 loss function, 2011.
- Sitawarin et al. (2018) Sitawarin, C., Bhagoji, A. N., Mosenia, A., Chiang, M., and Mittal, P. Darts: Deceiving autonomous cars with toxic signs. arXiv preprint arXiv:1802.06430, 2018.
- Stallkamp et al. (2011) Stallkamp, J., Schlipsing, M., Salmen, J., and Igel, C. The German Traffic Sign Recognition Benchmark: A multi-class classification competition. In IEEE International Joint Conference on Neural Networks, pp. 1453–1460, 2011.
- Szegedy et al. (2013) Szegedy, C., Zaremba, W., Sutskever, I., Bruna, J., Erhan, D., Goodfellow, I., and Fergus, R. Intriguing properties of neural networks. arXiv preprint arXiv:1312.6199, 2013.
- Tang et al. (2014) Tang, Y., Li, X., Xu, Y., Liu, S., and Ouyang, S. A mixed integer programming approach to maximum margin 0–1 loss classification. In 2014 International Radar Conference, pp. 1–6. IEEE, 2014.
- Xie et al. (2019) Xie, M., Xue, Y., and Roshan, U. Stochastic coordinate descent for 0/1 loss and its sensitivity to adversarial attacks. In Proceedings of 18th IEEE International Conference on Machine Learning and Applications - ICMLA 2019, pp. to appear, 2019.
- Zhai et al. (2013) Zhai, S., Xia, T., Tan, M., and Wang, S. Direct 0-1 loss minimization and margin maximization with boosting. In Burges, C., Bottou, L., Welling, M., Ghahramani, Z., and Weinberger, K. (eds.), Advances in Neural Information Processing Systems 26, pp. 872–880. Curran Associates, Inc., 2013.
- Zhang et al. (2019) Zhang, H., Yu, Y., Jiao, J., Xing, E. P., Ghaoui, L. E., and Jordan, M. I. Theoretically principled trade-off between robustness and accuracy. arXiv preprint arXiv:1901.08573, 2019.