secml: A Python Library for Secure and Explainable Machine Learning

12/20/2019 ∙ by Marco Melis, et al. ∙ Universita Cagliari 122

We present secml, an open-source Python library for secure and explainable machine learning. It implements the most popular attacks against machine learning, including not only test-time evasion attacks to generate adversarial examples against deep neural networks, but also training-time poisoning attacks against support vector machines and many other algorithms. These attacks enable evaluating the security of learning algorithms and of the corresponding defenses under both white-box and black-box threat models. To this end, secml provides built-in functions to compute security evaluation curves, showing how quickly classification performance decreases against increasing adversarial perturbations of the input data. secml also includes explainability methods to help understand why adversarial attacks succeed against a given model, by visualizing the most influential features and training prototypes contributing to each decision. It is distributed under the Apache License 2.0, and hosted at https://gitlab.com/secml/secml.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 3

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Machine learning has been shown to be vulnerable to well-crafted attacks, including test-time evasion (i.e., adversarial examples) and training-time poisoning attacks (Huang et al., 2011; Biggio et al., 2013, 2012; Szegedy et al., 2014; Papernot et al., 2016; Carlini and Wagner, 2017; Biggio and Roli, 2018). The main idea behind such adversarial attacks, first explored by Biggio et al. (2012, 2013), has been to formalize them as constrained optimization problems, and use gradient-based solvers to generate the corresponding attack samples (see, e.g., Biggio and Roli, 2018; Joseph et al., 2018). This research field has however boomed only after that Szegedy et al. (2014) independently discovered the same kind of vulnerability on state-of-the-art deep neural networks (DNNs) for image classification. Since then, adversarial examples have been demonstrated in many different domains, including also speech recognition and malware detection. Despite the large amount of papers published in this research area, properly evaluating the security of learning algorithms and designing effective defenses against adversarial attacks remain two challenging open issues.

In this work, we present secml, an open-source Python library that aims to help tackle the aforementioned issues and favor the development of more secure learning algorithms. To this end, secml implements: () a methodology for the empirical security evaluation of machine-learning algorithms under different evasion and poisoning attack scenarios; and () explainable methods to help understand why and how adversarial attacks are able to subvert the decisions provided by machine-learning algorithms. With respect to other popular libraries that implement attacks almost solely against DNNs (Papernot et al., 2018; Rauber et al., 2017; Nicolae et al., 2018), secml

also implements training-time poisoning attacks and computationally-efficient test-time evasion attacks against many different algorithms, including support vector machines (SVMs) and random forests (RFs). It also incorporates both the feature-based and prototype-based explanation methods proposed by 

Ribeiro et al. (2016); Sundararajan et al. (2017); Koh and Liang (2017).

2 secml: Architecture and Implementation

Figure 1: Main packages and architecture of secml.

secml

has a modular architecture oriented to code reuse. We have defined abstract interfaces for all components, including loss functions, regularizers, optimizers, classifiers and attacks. By separating the definition of the optimization problem from the algorithm used to solve it, one can easily define novel attacks or classifiers (in terms of constrained optimization problems) and then use different optimizers to obtain a solution. This is a great advantage with respect to other libraries like

CleverHans (Papernot et al., 2018) as, e.g., we can switch from white- to black-box attacks by just changing the optimizer (from a gradient-based to a gradient-free solver), without re-defining the entire optimization problem.

secml integrates different components via well-designed wrapper classes. We have integrated many attack implementations from CleverHans

, and extended them to also compute the values of the loss function and of the intermediate points optimized during the attack iterations, as well as the number of function and gradient evaluations. This is useful to debug and compare different attacks, e.g., by checking their convergence to a local optimum, and properly tune their hyperparameters (e.g., step size and number of iterations).

secml supports DNNs via a dedicated PyTorch

wrapper, which can be extended to include other popular deep-learning frameworks, like

TensorFlow and Keras. This allows us to run attacks that are natively implemented in CleverHans also against PyTorch models.

Main packages. The library is organized in different packages, as depicted in Fig. 1. The adv package implements different adversarial attacks and provides the functionalities to perform security evaluations. It encompasses the evasion attacks provided by CleverHans, as well as our implementations of evasion and poisoning attacks Biggio and Roli (2018). The ml package imports classifiers from scikit-learn and DNNs from PyTorch. We have extended scikit-learn

classifiers with the gradients required to run evasion and poisoning attacks, which have been implemented analytically. Our library also supports chaining different modules (e.g., scalers and classifiers) and can automatically compute the corresponding end-to-end gradient via the chain rule. The

explanation package implements the feature- and prototype-based explanation methods by Ribeiro et al. (2016); Sundararajan et al. (2017); Koh and Liang (2017). The optim package provides an implementation of the projected gradient descent (PGD) algorithm, and a more efficient version of it that runs a bisect line search along the gradient direction (PGD-LS) to reduce the number of gradient evaluations (for more details, see Demontis et al., 2019). Finally, data provides data loaders for popular datasets, integrating those provided by scikit-learn and PyTorch; array provides a higher-level interface for both dense (numpy) and sparse (scipy) arrays, enabling the efficient execution of attacks on sparse data representations; figure implements some advanced plotting functions based on matplotlib (e.g., to visualize and debug attacks); and utils provides functionalities for logging and parallel code execution.

Testing and documentation. We have run extensive tests on macOS X, Ubuntu 16.04, Debian 9 and 10, via a dedicated continuous-integration server. We have also successfully run some preliminary tests on Windows 10. The user documentation is available at https://secml.gitlab.io, along with a basic developer guide detailing how to extend the ml package with other classifiers and deep-learning frameworks. The complete set of unit tests will be released within the next versions of the library. Many Python notebooks are already available, including the tutorial presented in the next section.

3 Evasion Attacks on ImageNet

We show here how to use secml

to run different evasion attacks against ResNet-18, a DNN pretrained on ImageNet, available from

torchvision. This usage example aims to demonstrate that secml enables running also CleverHans attacks (implemented in TensorFlow) against PyTorch models. In particular, we aim to have the race car depicted in Fig. 2 misclassified as a tiger, using the -norm targeted implementations of the Carlini-Wagner (CW) attack (from CleverHans), and of our PGD attack. We also consider a variant of our PGD attack, referred to as PGD-patch here, where we restrict the attacker to only change the pixels of the image corresponding to the license plate, using a box constraint (Melis et al., 2017).

Figure 2: Adversarial images (CW, PGD, and PGD-patch) representing a race car misclassified as a tiger. For PGD-patch, we also report explanations via integrated gradients.
Figure 3: Attack optimization. Left: loss minimization; Right: confidence of source class (race car, dashed lines) vs confidence of target class (tiger, solid lines), across iterations.

Experimental settings. We run all the attacks for iterations, adjusting the step size to reach convergence within this range. For CW, we set the confidence parameter to generate high-confidence misclassifications, and , yielding an perturbation size . We bound PGD to create an adversarial image with the same perturbation size. For PGD-patch, we do not bound the perturbation size for the pixels that can be modified.

Results. The resulting adversarial images are shown in Fig. 2. For PGD-patch, we also highlight the most relevant pixels used by the DNN to classify this image as a tiger, using integrated gradients as the explanation method. The most relevant pixels are found around the perturbed region containing the license plate, unveiling the presence of a potential adversarial manipulation. Finally, in Fig. 3, we report some plots to better understand the attack optimization process. The leftmost plot shows how the attack losses (scaled linearly in to enable comparison) are minimized while the attacks iterate. The rightmost plot shows how the confidence assigned to class race car (dashed line) decreases in favor of the confidence assigned to class tiger (solid line) for each attack, across different iterations. We have found these plots particularly useful to tune the attack hyperparameters (e.g., step size and number of iterations), and to check their converge to a good local optimum. We firmly believe that such visualizations will help avoid common pitfalls in the security evaluation of learning algorithms, facilitating understanding and configuration of the attack algorithms.

4 Conclusions and Future Work

The secml project was born more than five years ago. Even though we only open-sourced it in August 2019, it has already attracted more than 700 users.111See statistics at: https://pypistats.org/packages/secml Thanks to the help of this emerging community of users and developers, we firmly believe that secml will soon become a reference tool to evaluate the security of machine-learning algorithms. We are constantly working to enrich it with new functionalities, by adding novel defenses, wrappers for other third-party libraries, and more pretrained models to the secml model zoo.

Acknowledgements

This work has been partly supported by the PRIN 2017 project RexLearn, funded by the Italian Ministry of Education, University and Research (grant no. 2017TWNMH2); and by the project ALOHA, under the European Union’s H2020 programme (grant no. 780788).

References

  • Biggio and Roli (2018) B. Biggio and F. Roli.

    Wild patterns: Ten years after the rise of adversarial machine learning.

    Pattern Recognition, 84:317–331, 2018.
  • Biggio et al. (2012) B. Biggio, B. Nelson, and P. Laskov. Poisoning attacks against support vector machines. In J. Langford and J. Pineau, editors, 29th ICML, pages 1807–1814. Omnipress, 2012.
  • Biggio et al. (2013) B. Biggio, I. Corona, D. Maiorca, B. Nelson, N. Šrndić, P. Laskov, G. Giacinto, and F. Roli. Evasion attacks against machine learning at test time. In ECML PKDD, Part III, volume 8190 of LNCS, pages 387–402. Springer Berlin Heidelberg, 2013.
  • Carlini and Wagner (2017) N. Carlini and D. A. Wagner. Towards evaluating the robustness of neural networks. In IEEE Symposium on Security and Privacy, pages 39–57. IEEE Computer Society, 2017.
  • Demontis et al. (2019) A. Demontis, M. Melis, M. Pintor, M. Jagielski, B. Biggio, A. Oprea, C. Nita-Rotaru, and F. Roli. Why do adversarial attacks transfer? Explaining transferability of evasion and poisoning attacks. In 28th USENIX Sec. Symp. USENIX Association, 2019.
  • Huang et al. (2011) L. Huang, A. D. Joseph, B. Nelson, B. Rubinstein, and J. D. Tygar. Adversarial machine learning. In 4th ACM Workshop AISec, pages 43–57, Chicago, IL, USA, 2011.
  • Joseph et al. (2018) A. D. Joseph, B. Nelson, B. I. P. Rubinstein, and J. Tygar. Adversarial Machine Learning. Cambridge University Press, 2018.
  • Koh and Liang (2017) P. W. Koh and P. Liang. Understanding black-box predictions via influence functions. In International Conference on Machine Learning (ICML), 2017.
  • Melis et al. (2017) M. Melis, A. Demontis, B. Biggio, G. Brown, G. Fumera, and F. Roli. Is deep learning safe for robot vision? Adversarial examples against the iCub humanoid. In ICCVW Vision in Practice on Autonomous Robots (ViPAR), pages 751–759. IEEE, 2017.
  • Nicolae et al. (2018) M.-I. Nicolae, M. Sinn, M. N. Tran, B. Buesser, A. Rawat, M. Wistuba, V. Zantedeschi, N. Baracaldo, B. Chen, H. Ludwig, I. Molloy, and B. Edwards. Adversarial robustness toolbox v1.0.1. CoRR, 1807.01069, 2018.
  • Papernot et al. (2016) N. Papernot, P. McDaniel, S. Jha, M. Fredrikson, Z. B. Celik, and A. Swami. The limitations of deep learning in adversarial settings. In 1st IEEE Euro SP, pages 372–387. IEEE, 2016.
  • Papernot et al. (2018) N. Papernot, F. Faghri, N. Carlini, I. Goodfellow, et al. Technical Report on the CleverHans v2.1.0 Adversarial Examples Library. arXiv preprint arXiv:1610.00768, 2018.
  • Rauber et al. (2017) J. Rauber, W. Brendel, and M. Bethge. Foolbox: A python toolbox to benchmark the robustness of machine learning models. arXiv preprint arXiv:1707.04131, 2017.
  • Ribeiro et al. (2016) M. T. Ribeiro, S. Singh, and C. Guestrin. “Why should I trust you?”: Explaining the predictions of any classifier. In KDD, pages 1135–1144, New York, NY, USA, 2016. ACM.
  • Sundararajan et al. (2017) M. Sundararajan, A. Taly, and Q. Yan. Axiomatic Attribution for Deep Networks. In ICML, pages 3319–3328, July 2017.
  • Szegedy et al. (2014) C. Szegedy, W. Zaremba, I. Sutskever, J. Bruna, D. Erhan, I. Goodfellow, and R. Fergus. Intriguing properties of neural networks. In ICLR, 2014.