With more than millions of malicious applications discovered in the wild, Android malware constitutes one of the major threats in mobile security. Among the various detection strategies proposed by companies and academic researchers, those based on machine learning have shown the most promising results, due to their flexibility against malware variants and obfuscation attempts [1, 7]. Despite the impressive performances reported by such approaches on benchmark datasets, the problem of Android malware detection in the wild is still far from being solved. The validity of such optimistic, in-vitro evaluations has been indeed questioned from recent adversarial analyses showing that only few changes to the content of a malicious Android application may suffice to evade detection by a learning-based detector [8, 6]. Besides this fragility to well-crafted evasion attacks (a.k.a. adversarial examples) [5, 4, 18, 10], Sommer and Paxson  have more generally questioned the suitability of black-box machine-learning approaches to computer security. In particular, how can we thus trust the predictions of a machine-learning model in vivo, i.e., when it is deployed in an operating environment, to take subsequent reliable actions? How can we understand whether we are selecting a proper model before deployment? How about its security properties against adversarial attacks?
To partially address these issues, Android malware detectors often restrict themselves to the use of linear, explainable machine-learning models that allow one to easily identify the most influential features contributing to each decision (Sect. II) [1, 2]. More generally, intepretability of machine-learning models has recently become a relevant research direction to more thoroughly address and mitigate the aforementioned issues, especially in the case of nonlinear black-box machine-learning algorithms [13, 3, 12, 14, 9]. Some approaches aim to explain local predictions (i.e., on each specific sample) by identifying the most influential features [3, 14] or prototypes from training data . Others have proposed techniques and methodologies towards providing global explanations about the salient characteristics learned by a given machine-learning algorithm [13, 9].
In this work, we generalize current explainable Android malware detection approaches to any black-box machine-learning model, by leveraging a gradient-based approach to identify the most influential local features (Sect. III
). For non-differentiable learning algorithms, like decision trees, we extract gradient information by learning a differentiable approximation. Notably, this idea has originally been exploited to construct gradient-based evasion attacks against non-differentiable learners, and evaluate theirtransferability
, i.e., the probability that an attack crafted against a learning algorithm succeeds against a different one[4, 10, 15]. Accordingly, our approach provides interpretable decisions even for Android malware detectors exploiting nonlinear learning algorithms to potentially increase detection accuracy. Moreover, by averaging the local relevant features across different classes of samples, our approach allows also highlighting the global characteristics learned by a given model to identify benign applications and different classes of Android malware.
). It extracts information the Android application through static analysis, and provides interpretable decisions by leveraging a linear classification algorithm. To test the validity of our approach, we show how to retain the interpretability of Drebin on nonlinear algorithms, including Support Vector Machines (SVMs) and Random Forests (RFs). Interestingly, we also show that the interpretations provided by our approach can help identifying potential vulnerabilities of both linear and nonlinear Android malware detectors against adversarial manipulations.
We conclude the paper by discussing contributions and limitations of this work, and future research directions towards developing more robust malware detectors (Sect. V).
Ii Android Malware Detection
In this section, we provide some background on how Android applications are structured, and then discuss Drebin , the malware detector used in our analysis.
Ii-a Android Background
Android applications are apk files, i.e., zipped archives that must contain two files: the Android manifest and the classes.dex. Additional xml and resource files are respectively used to define the application layout and to provide multimedia contents. As Drebin only analyzes the Android manifest and classes.dex files, we briefly describe them below.
Android Manifest. The manifest file holds information about how the application is organized in terms of its components, i.e., parts of code that perform specific actions; e.g., one component might be associated to a screen visualized by the user (activity) or to the execution of audio in the background (services). It is also possible to perform actions on the occurrence of a specific event (receivers). The actions of each component are further specified through filtered intents; e.g., when a component sends data to other applications, or is invoked by a browser. Special types of intent filters (e.g., LAUNCHER) can specify that a certain component is executed as soon as the application is opened. The manifest also contains the list of hardware components and permissions requested by the application to work (e.g., Internet access).
Dalvik Bytecode (dexcode). The classes.dex file embeds the compiled source code of an application, including all the user-implemented methods and classes. Classes.dex may contain specific API calls that can access sensitive resources such as personal contacts (suspicious calls). Additionally, it contains all system-related, restricted API calls whose functionality require permissions (e.g., using the Internet). Finally, this file can contain references to network addresses that might be contacted by the application.
Drebin performs a lightweight static analysis of Android applications. The extracted features are used to embed benign and malware apps into a high-dimensional vector space, train a machine-learning model, and then perform classification of never-before-seen apps. An overview of the system architecture is given in Fig. 1, and discussed more in detail below.
First, Drebin statically analyzes a set of available Android applications to construct a suitable feature space. All features extracted by Drebin are presented asstrings and organized in 8 different feature sets, as listed in Table I.
|Hardware components||Restricted API calls|
|Requested permissions||Used permission|
|Application components||Suspicious API calls|
|Filtered intents||Network addresses|
Android applications are then mapped onto the feature space as follows. Let us assume that an app is represented as an object , being the abstract space of all apk files. We then denote with a function that maps an apk file to a -dimensional feature vector , where each feature is set to 1 (0) if the corresponding string is present (absent) in the apk file . An application encoded in feature space may thus look like the following:
Learning and Classification. Drebin uses a linear SVM to perform detection. It can be expressed in terms of a linear function , i.e., , where denotes the vector of feature weights, and is the so-called bias
. These parameters, optimized during training, identify a hyperplane that separates the two classes in feature space. During classification, unseen apps are then classified as malware if, and as benign otherwise.
Explanation. Drebin explains its decisions by reporting, for any given application, the most influential features, i.e., the features that are present in the given application and are assigned the highest absolute weights by the classifier. For instance, in Fig. 1
, it is easy to see, from its most influential features, that a malware sample is correctly identified by Drebin as it connects to a suspicious URL and uses SMS as a side channel for communication. As we aim to extend this approach to nonlinear models, in this work we also consider an SVM with the Radial Basis Function (RBF) kernel and a random forest to learn nonlinear functions.
Iii Interpreting Decisions of Learning-based Black-box Android Malware Detectors
We discuss here our idea to generalize the explainable decisions of Drebin and other locally-explainable Android malware detectors [1, 2] to any black-box (i.e., nonlinear) machine-learning algorithm. In addition, we also propose a method to explain the global characteristics influencing the decisions of the learning-based malware detector at hand.
Local explanations. Previous work has highlighted that gradients and, more generally, linear approximations computed around the input point convey useful information for explaining the local predictions provided by a learning algorithm [3, 14]. The underlying idea is to identify as most influential those features associated to the highest (absolute) values of the local gradient , being the confidence associated to the predicted class. However, in the case of sparse data, as for Android malware, these approaches tend to identify a high number of influential features which are not present in the given application, thus making the corresponding predictions difficult to interpret. For this reason, in this work we consider a slightly different approach, inspired from the notion of directional derivative. In particular, we project the gradient onto to obtain a feature-relevance vector , where denotes the element-wise product. We then normalize to have a unary norm, i.e., , to ensure that only non-null features in are identified as relevant for the decision. Finally, the absolute values of can be ranked in descending order to identify the most influential local features.
Global explanations. In contrast to other locally-explainable malware detectors [1, 2], we also provide a global analysis of the interpretability of the considered machine-learning models, aimed to identify the most influential features, on average, which characterize benign and malware samples. Our idea is simply to average the relevance vectors over different samples, e.g., separately for benign and malware data. Then, as in the local case, the absolute values of the average relevance vector can be ranked in descending order to identify the most influential global features.
Non-differentiable models. Our approach works under the assumption that is differentiable and that its gradient is sufficiently smooth to provide meaningful information at each point. When is not differentiable (e.g., for decision trees and random forests), or its gradient vanishes (e.g., if becomes constant in large regions of the input space), we compute approximate feature-relevance vectors by means of surrogate models. The idea is to train a differentiable approximation of the target function , similar to what has been done in  for interpretability of non-differentiable models, and in [4, 15]
to craft gradient-based evasion attacks against non-differentiable learning algorithms. For instance, to reliably estimate a non-differentiable algorithm(e.g., a random forest), one can train a nonlinear SVM on a training set relabeled with the predictions provided by .
Iv Experimental Analysis
In this section, we use our approach to provide local and global explanations for linear and nonlinear (including non-differentiable) classifiers trained on the features used by Drebin. As we will see, this will also reveal some insights on their security against adversarial manipulations [8, 6].
Datasets. We use here the Drebin data , consisting of benign applications and malicious samples, labeled with VirusTotal. A sample is labeled as malicious if it is detected by at least five anti-virus scanners, whereas it is labeled as benign if no scanner flagged it as malware.
Training-test splits. We average our results on 5 runs. In each run, we randomly select 60,000 apps from the Drebin data to train the learning algorithms, and use the rest for testing.
Classifiers. We compare the standard Drebin implementation based on a linear SVM (SVM) against an SVM with the RBF kernel (SVM-RBF) and a (non-differentiable) Random Forest (RF). As discussed in Sect. III, a surrogate model is needed to interpret the RF; to this end, we train an SVM with the RBF kernel on the training set relabeled by the RF (yielding an approximation with accuracy higher than on average on the relabeled testing sets). The Receiver Operating Characteristic (ROC) curve for each classifier, averaged over the 5 repetitions, is reported in Fig. 2.
Parameter setting. We optimize the parameters of each classifier through a 3-fold cross-validation procedure. In particular, we optimize for both linear and non-linear SVMs, the kernel parameter for the SVM-RBF, and the number of estimators for the RF.
Iv-a Local Explanations
Table II reports the top-10 influential features, sorted by their (absolute) relevance values, for three distinct samples classified by the linear SVM and the RF classifier, along with their probability of being present in each class. Notably, relevant features can also be rare. This means that a feature is deemed relevant even if it characterizes well only a small subset of samples in a given class (e.g., a malware family).
Case 1. The first example is a benign application misclassified by the SVM with a score of , and correctly classified by the RF (probability and surrogate score of ). By observing the features through their relevance scores, it is evident that the RF is able to correctly classify this sample as benign as several features are assigned a negative relevance score, while almost all of them are considered as malicious (positive score) by the SVM. In both cases the use of SMS messages for communication is retained suspicious; however, for the RF this is not a sufficient evidence of maliciousness.
Case 2. The second example is a malware sample of the SmsWatcher family, which is correctly classified by the SVM (score ), but not by the RF model (probability and surrogate score of ), for a reason similar to the previous case: permissions () and API calls () related to SMS usage are not a sufficient evidence of maliciousness for the RF. Indeed, this classifier does not even identify as suspicious the application components () related to SMS usage, which instead constitute a signature for this malware family, as correctly learned by the linear SVM model.
Case 3. The last case is a malware sample of the Plankton family, correctly classified by both models (SVM score ; RF probability and surrogate score ), as they correctly identified the behavioral patterns of this family associated to HTTP communication and actions.
Iv-B Global Explanations
We performed a global analysis of the models learned by each algorithm by averaging the local relevance vectors over different classes of samples: benign, malware, and the top- malware families with the largest number of samples in the Drebin data (Table III). This gives us a global (mean) relevance vector for each class. Then, for each class of samples, we report a compact and a fine-grained analysis of the global feature-relevance values . In the compact analysis, we further average the global relevance over each feature set (Table I). In the fine-grained analysis, we simply report the global relevance score for the top features (selected by aggregating the top 5 features with the highest average relevance score for each class of samples).
The results are shown in Fig. 3. The compact analysis highlights the importance of permissions () and suspicious API calls ( group) in identifying malware. This is reasonable, as the majority of malware samples require permissions to perform specific actions, like stealing contacts and opening SMS and other side communication channels. The fine-grained analysis provides a more detailed characterization of the aforementioned behavior, highlighting how each classifier learns a specific behavioral signature for each class of samples. In particular, malware families are characterized by their communication channels (e.g., SMS and HTTP), by the amount of stolen information and accessed resources, and by specific application components or URLs ( and ).
Finally, note that all classifiers tend to assign high relevance to a very small set of features in each decision, both at a local and at a global scale. Given that manipulating the content of Android malware can be relatively easy, especially due to the possibility of injecting dead code, this behavior highlights the potential vulnerability of such classifiers. In fact, if the decisions of a classifier rely on few features, it is intuitive that detection can be easily evaded by manipulating only few of them, as also confirmed in previous work [8, 6]. Conversely, if a model distributes relevance more evenly among features, evasion may be more difficult (i.e., require manipulating a higher number of features, which may not be always feasible). More robust learning algorithms for these tasks have been proposed based exactly on this rationale, which has also a theoretically-sound interpretation .
Another interesting point regards the transferability of evasion attacks across different models, i.e., the fact that an attack crafted against a specific classifier may still be successful with high probability against a different one. From our analysis, it is clear that in this case this property depends more on the available training data rather than on the specific learning algorithm: the three considered classifiers learn very similar patterns of feature relevances, as clearly highlighted in Fig. 3, which simply means that they can be evaded with very similar modifications to the input sample.
V Contributions, Limitations and Future Work
In this paper, we provided a general approach to achieve explainable malware detection on Android, applicable to any black-box machine-learning model. Our explainable approach can help analysts to understand possible vulnerabilities of learning algorithms to well-crafted evasion attacks along with their transferability properties, besides providing a local and global understanding of how a machine-learning model makes its decisions. We plan to analyze also different strategies to provide global explanations. In fact, averaging can potentially soften the contribution of features that are highly relevant only for few samples. Another interesting issue is how to choose the surrogate model to provide explanations for non-differentiable models. Some theoretical results show that, under certain assumptions, some learning algorithms can provide similar decision functions; e.g., nonlinear SVMs may reliably approximate random forests . Nevertheless, it is still required to investigate how different surrogate models impact the explanations provided by our approach. These are all relevant issues towards the development of interpretable models, as required by the novel European General Data Protection Regulation (GDPR) . The right of explanation stated by GDPR imposes to develop models that are transparent with respect to their decisions. We believe that this work is a first step towards this direction.
This work was partly supported by the EU H2020 project ALOHA, under the European Union’s Horizon 2020 research and innovation programme (grant no. 780788), and by the PISDAS project, funded by the Sardinian Regional Administration (CUP E27H14003150007).
-  D. Arp, M. Spreitzenbarth, M. Hübner, H. Gascon, and K. Rieck. Drebin: Efficient and explainable detection of android malware in your pocket. In Proc. 21st NDSS. The Internet Society, 2014.
-  M. Backes and M. Nauman. LUNA: quantifying and leveraging uncertainty in android malware analysis through Bayesian machine learning. In EuroS&P, pp. 204–217. IEEE, 2017.
-  D. Baehrens, T. Schroeter, S. Harmeling, M. Kawanabe, K. Hansen, and K.-R. Müller. How to explain individual classification decisions. J. Mach. Learn. Res., 11:1803–1831, 2010.
-  B. Biggio, I. Corona, D. Maiorca, B. Nelson, N. Šrndić, P. Laskov, G. Giacinto, and F. Roli. Evasion attacks against machine learning at test time. In ECML, vol. 8190, LNCS, pp. 387–402. Springer, 2013.
B. Biggio and F. Roli.
Wild patterns: Ten years after the rise of adversarial machine learning.ArXiv, 2018.
-  A. Calleja, A. Martin, H. D. Menendez, J. Tapiador, and D. Clark. Picking on the family: Disrupting android malware triage by forcing misclassification. Expert Systems with Applications, 95:113 – 126, 2018.
-  S. Chen, M. Xue, Z. Tang, L. Xu, and H. Zhu. Stormdroid: A streaminglized machine learning-based system for detecting Android malware. In ASIA CCS, pp. 377–388, 2016. ACM.
-  A. Demontis, M. Melis, B. Biggio, D. Maiorca, D. Arp, K. Rieck, I. Corona, G. Giacinto, and F. Roli. Yes, machine learning can be more secure! A case study on Android malware detection. IEEE Trans. Dependable and Secure Computing, In press.
-  F. Doshi-Velez and B. Kim. Towards A Rigorous Science of Interpretable Machine Learning. ArXiv, 2017.
-  I. J. Goodfellow, J. Shlens, and C. Szegedy. Explaining and harnessing adversarial examples. In ICLR, 2015.
-  B. Goodman and S. Flaxman. European Union regulations on algorithmic decision-making and a “right to explanation”. ArXiv, 2016.
-  P. W. Koh and P. Liang. Understanding black-box predictions via influence functions. In ICML, 2017.
-  Z. C. Lipton. The mythos of model interpretability. In ICML Workshop on Human Interpretability in Machine Learning, pp. 96–100, 2016.
-  M. T. Ribeiro, S. Singh, and C. Guestrin. Why should i trust you?: Explaining the predictions of any classifier. In KDD, pp. 1135–1144, 2016. ACM.
-  P. Russu, A. Demontis, B. Biggio, G. Fumera, and F. Roli. Secure kernel machines against evasion attacks. In AISec, pp. 59–69, 2016. ACM.
-  R. Sommer and V. Paxson. Outside the closed world: On using machine learning for network intrusion detection. In IEEE Symp. Security and Privacy, pp. 305–316, 2010. IEEE CS.
C. Szegedy, W. Zaremba, I. Sutskever, J. Bruna, D. Erhan, I. Goodfellow, and
Intriguing properties of neural networks.In ICLR, 2014.
-  C. Szegedy, W. Zaremba, I. Sutskever, J. Bruna, D. Erhan, I. Goodfellow, and R. Fergus. Intriguing properties of neural networks. In ICLR, 2014.
-  L. Breiman. Some infinity theory for predictor ensembles. Technical Report 579, Statistics Dept. UCB, 2000.