Log In Sign Up

Do Gradient-based Explanations Tell Anything About Adversarial Robustness to Android Malware?

by   Marco Melis, et al.

Machine-learning algorithms trained on features extracted from static code analysis can successfully detect Android malware. However, these approaches can be evaded by sparse evasion attacks that produce adversarial malware samples in which only few features are modified. This can be achieved, e.g., by injecting a small set of fake permissions and system calls into the malicious application, without compromising its intrusive functionality. To improve adversarial robustness against such sparse attacks, learning algorithms should avoid providing decisions which only rely upon a small subset of discriminant features; otherwise, even manipulating some of them may easily allow evading detection. Previous work showed that classifiers which avoid overemphasizing few discriminant features tend to be more robust against sparse attacks, and have developed simple metrics to help identify and select more robust algorithms. In this work, we aim to investigate whether gradient-based attribution methods used to explain classifiers' decisions by identifying the most relevant features can also be used to this end. Our intuition is that a classifier providing more uniform, evener attributions should rely upon a larger set of features, instead of overemphasizing few of them, thus being more robust against sparse attacks. We empirically investigate the connection between gradient-based explanations and adversarial robustness on a case study conducted on Android malware detection, and show that, in some cases, there is a strong correlation between the distribution of such explanations and adversarial robustness. We conclude the paper by discussing how our findings may thus enable the development of more efficient mechanisms both to evaluate and to improve adversarial robustness.


page 1

page 2

page 3

page 4


Explaining Black-box Android Malware Detection

Machine-learning models have been recently used for detecting malicious ...

Can Machine Learning Model with Static Features be Fooled: an Adversarial Machine Learning Approach

The widespread adoption of smartphones dramatically increases the risk o...

When the Guard failed the Droid: A case study of Android malware

Android malware is a persistent threat to billions of users around the w...

GANG-MAM: GAN based enGine for Modifying Android Malware

Malware detectors based on machine learning are vulnerable to adversaria...

Can We Trust Your Explanations? Sanity Checks for Interpreters in Android Malware Analysis

With the rapid growth of Android malware, many machine learning-based ma...

On Training Robust PDF Malware Classifiers

Although state-of-the-art PDF malware classifiers can be trained with al...