The term ransomware refers to attacks that lock the victim’s device or encrypt its data, by asking a sum of money to restore the compromised functionality. Despite the increasing diffusion of cloud-based technologies, users still store the majority of their data directly on their devices. For this reason, such attacks are particularly devastating, as they could destroy sensitive data of private users and companies (which often neglect to make backups of sensitive data). According to Symantec, the number of ransomware variants increased in by , with massive outbreaks such as the one concerning Ukrainian companies (Petya/NotPetya). Hence, it is not surprising to see that the same trend applied to mobile ransomware, with more than samples blocked in symantec-report18 .
Mobile ransomware typically features different characteristics in comparison to its X86 counterpart. As performing data encryption typically requires high-level privileges (especially to write on areas that are directly managed by the kernel), most attacks only lock the target device by making victims believe that their data are encrypted, or by warning them that the police currently control them for their actions (a strategy directly inspired by scareware-based approaches).
To counteract such attacks, Machine Learning has been increasingly used (especially combined with static analysis) both by researchers and anti-malware companies, either to perform direct detection or to generate signatures. While the goal of static detection systems is often to discriminate between generic malware and legitimate files (as in demontis17-tdsc ; rieck14-drebin ; chen16-asiaccs ; ahmadi17-cdmake ), some recently-released ones focus on further identifying malware families, in particular ransomware-related ones (also known as ransomware-oriented detection) andronio2015heldroid ; zheng16-securecomm ; garcia18-tosem . The reason for such a choice is that ransomware infections may lead to permanent data loss, making their early detection critical.
The main characteristic of systems to detect malware families is that they rely on different types of information extracted from multiple parts of the apps (e.g., bytecode, manifest, native libraries, and so forth rieck14-drebin ; andronio2015heldroid ; zheng16-securecomm ; chen16-asiaccs ; garcia18-tosem
), which leads to use large amounts of features (even hundreds of thousands). While this approach is tempting and may seem to be effective against the majority of attacks in the wild, it features various limitations. First, it is unclear which features are essential (and needed) for classification, an aspect that worsens the overall explainability of the system (i.e., why the system makes mistakes and how to fix them). Second, increasing the types of features extends the degrees of freedom of a skilled attacker to perform targeted attacks against the learning algorithm. For example, it would be quite easy to mask a specific IP address, if the attacker understood that this has a vital role for detectiondemontis17-tdsc . Finally, the computational complexity of such systems is enormous, which makes them unfeasible to be practically used in mobile devices, an important aspect to guarantee offline, early detection of these attacks.
In a previous work maiorca17-sac , we proposed a system that allowed to discriminate between ransomware, generic malware, and legitimate files by focusing on a small-sized feature set, i.e., System API packages. The idea of our work was to overcome the limitations described above by showing that it was possible to solve a machine learning problem with a limited number of features of the same type. However, System API-based information does not only include packages but also classes and methods (particularly employed in other works, especially mixed with other feature types rieck14-drebin ; garcia18-tosem ) that better define the behavior of APIs. Intuitively, using finer-grained information leads to better accuracy and robustness in comparison to other approaches. In this paper, we explore such a possibility by extending our work in maiorca17-sac . In particular, we inspected the capabilities of multiple types of System API-related information to discriminate ransomware from malware and goodware. More specifically, we aimed to provide an answer to the following Research Questions:
RQ 1. Does the use of finer-grained information related to System API (i.e., classes and methods) improve detection performances in comparison to more general System API packages?
RQ 2. Is System API-based information suitable to detect novel attacks in the wild?
RQ 3. Does using System API-based information provide comparable performances to other approaches that employ multiple feature types?
RQ 4. Is System API-based information resilient against obfuscation attempts?
To answer such Research Questions, we explored three types of System API-based information: the first one only used information related to System API packages (as already shown in maiorca17-sac ), the second one analyzed System API classes, and the third one employed information related to System API methods. We evaluated the performances of the three systems on a wide range of ransomware, malware and goodware samples in the wild (including previously unseen data). Moreover, we tested all systems against a dataset of ransomware samples that have been obfuscated with multiple techniques (including class encryption).
The attained results showed that all System API-based techniques provided excellent accuracy at detecting ransomware and generic malware in the wild, by also showing capabilities of predicting novel attacks and resilience against obfuscation. From a methodological perspective, such results demonstrate that it is possible to develop accurate systems by strongly reducing the complexity of the examined information and by selecting feature types that represent how ransomware attacks behave.
Finally, we ported the System API package-based strategy to the Android phone with the name of R-PackDroid (see also maiorca17-sac ). Our application, which can detect both ransomware and generic malware in the wild, shows that methodologies based on System API can be implemented with good computational performances even in old phones, and its a demonstration of a full working prototype being deployed on real analysis environments. R-Packdroid can be downloaded for free from the Google Play Store111http://pralab.diee.unica.it/en/RPackDroid.
With this work, we claim that it is possible to create effective, deployable, and reasonably secure approaches for ransomware and malware detection by only using specific feature types. Hence, we believe that the attention of research should be shifted to finding effective and explainable feature types to make detection even more accurate and robust.
Paper structure. Section 2 provides the basic concepts of Android apps; Section 3 discusses the essential characteristics of Android ransomware and describes the key-intuitions behind using System API calls as critical information; Section 4 provides a description of the related work in the field; Section 5 describes the employed detection methodologies; Section 6 illustrates the experimental results attained with all the methodologies, as well as a comparison between our systems and other approaches in the wild; Section 7 describes the implementation details of R-PackDroid and its computational performances; Section 8 discusses the limitations of our work, which is finally concluded by Section 9.
2 Background on Android
Android applications are zipped .apk (i.e., Android application package) archives that contain the following elements: (i) The AndroidManifest.xml file, which provides the application package name, and lists its basic components, along with the permissions that are required for specific operations; (ii) One or more classes.dex files, which are the true executable of the application, and which contain all the implemented classes and methods (in the form of Dalvik bytecode) that are executed by the app. This file can be disassembled to a simplified format called smali; (iii) Various .xml files that characterize the application layout; (iv) External resources that include, among others, images and native libraries.
Although Android applications are typically written in Java, they are compiled to an intermediate byte-code format called Dalvik (which is further referred to as DexCode), whose instructions are contained in the classes.dex file. This file is then further parsed at install time and converted to native ARM code that is executed by the Android RunTime (ART). This technique allows to greatly speed up execution in comparison to the previous runtime (dalvikvm, available till Android ), in which applications were executed with a just-in-time approach (during installation, the classes.dex file was only slightly optimized, but not converted to native code).
3 Android Ransomware
The key point presented in this work is that the static extraction of System API-based information can be effective at detecting ransomware. More specifically, System APIs encapsulate many of the key actions performed by such attacks. To better reinforce this concept, in the following, we describe the basic actions performed by Android ransomware. The majority of ransomware-based attacks for Android are based on the goal of locking the device screen while asking the victim for money in order to unlock it. According to the taxonomy proposed by chen18-tifs , there are multiple ways to do so: (i) by resorting to a hijacking activity (i.e., a screen that the user visualizes and with which she can interact) that is continuously shown; (ii) by setting up specific parameters of specific API calls; (iii) by disabling certain buttons, such home or back.
Locking is generally preferred to other data encryption strategies because it does not require to operate on high-privileged data. Indeed, accessing specific areas of the Android internal memory would only be possible with root permissions. Conversely, locking the device does not require particularly high privileges, and would allow the attacker to ensure his goal (i.e., scaring the victim) with minimum effort. The majority of locking screens show the victim writings and images related to police activities or pornographic material. There are, however, samples that also perform data encryption. According to chen18-tifs , only four ransomware families possess the ability of encrypting data: Simplocker, Koler, Cokri and Fobus. In particular, some of these families employ a customized encryption algorithm, while others resort to standard algorithms.
As locking and encryption actions require the use of multiple functions that involve core functionalities of the system (e.g., managing entire arrays of bytes, displaying activities, manipulating buttons and so on), attackers tend to use functions that directly belong to the Android System API. It would be extremely time consuming and inefficient to build new APIs that perform the same actions as the original ones.
As an example of this behavior, consider the DexCode snippet provided by Listing 1, belonging to a locker-type ransomware222MD5: 0cdb7171bcd94ab5ef8b4d461afc446c. In this example, it is possible to observe that the two function calls (expressed by invoke-virtual instructions) that are actually used to lock the screen (lockNow) and reset the password (resetPassword) are System API calls, belonging to the class DevicePolicyManager and to the package android/app/admin. The same behavior is provided by Listing 2, which shows the encryption function employed by a crypto-type ransomware sample333MD5: 59909615d2977e0be29b3ab8707c903a. Again, the functions to manipulate the bytes to encrypt belong to the System API (read and close, belonging to the FileInputStream class of the java/io package; flush and close, belonging to the CipherOutputStream class of the javax/crypto package).
In an Android application, based on Java, multiple methods are associated with classes that belong to packages. Because of these characteristics, it is possible to encode and represent System API information by either using packages, classes, or methods. More specifically, methods and classes better detail the functionality performed by the single API, but their number is significantly higher in comparison to packages. Hence, a solution that would employ the analysis of API methods would be far more complex than one that analyzes packages.
4 Related work
Most of Android malware detectors typically discriminate between malicious and benign apps, and we refer to them as generic malware-oriented detectors. However, as the scope of this work is mostly oriented to ransomware detection, this Section will be mainly focused on describing systems that aim to detect such attacks (ransomware-oriented detectors) specifically. A brief description of the other detectors will be provided at the end of this Section.
The most popular and publicly available ransomware-oriented detector is HelDroid, proposed by Andronio et al. andronio2015heldroid
. This tool includes a text classifier (based on NLP features) that works on suspicious strings used by the application, a lightweightsmali emulation technique to detect locking strategies, and the application of taint tracking for detecting file-encrypting flows. The system has then been further expanded by Zheng et al. with the new name of GreatEatlon and features significant speed improvements, a multiple-classifier system that combines the information extracted by text- and taint-analysis, and so forth zheng16-securecomm . However, despite using features oriented to ransomware detection, the final label provided for each analyzed sample by the released system is only malicious or benign, with no clear decision on the sample being ransomware or not. Furthermore, the system is still computationally demanding and it still strongly depends on a text classifier: the authors trained it on generic threatening phrases, similar to those that typically appear in ransomware or scareware. This strategy can be easily thwarted by employing, e.g., string encryption maiorca15-cose . Moreover, it strongly depends on the presence of a language dictionary for that specific ransomware campaign.
|Chen et al. (RansomProber) chen18-tifs||✓|
|Cimitille et al. (Talos) cimitile2017talos||✓|
|Gharib et al. (Dna-Droid) Gharib2017||✓||✓||✓|
|Song et al. song2016effective||✓|
|Zheng et al. (GreatEatlon) zheng16-securecomm||✓||✓||✓|
|Yang et al. yang2015automated||✓|
|Andronio et al. (HelDroid) andronio2015heldroid||✓||✓||✓|
Yang et al. proposed a tool to monitor the activity of ransomware by dumping the system messages log, including stack traces. Sadly, no implementation has been released for public usage yang2015automated .
Song et al. proposed a method that aims to discriminate between ransomware and goodware using process monitoring song2016effective . In particular, they considered system-related features representing the I/O rate, as well as the CPU and memory usage. The system has been evaluated with only one ransomware sample developed by the authors, and no implementation is publicly available.
Cimitille et al. introduced an approach to detect ransomware that is based on formal methods (by using a tool called Talos), which help the analyst identify malicious sections in the app code mercaldo2016ransomware ; cimitile2017talos . In particular, starting from the definition of payload behavior, the authors manually formulated logic rules that were later applied to detect ransomware. Unfortunately, such a procedure can become extremely time-consuming, as an expert should manually express such rules.
Gharib et al. proposed Dna-Droid
, a static and dynamic approach in which applications are first statically analyzed, and then dynamically inspected if the first part of the analysis returned a suspicious result. The system uses Deep Learning to provide a classification labelGharib2017 . The static part is based on textual and image classification, as well as on API calls and application permissions. The dynamic part relies on sequences of API calls that are compared to malicious sequences, which are related to malware families. This approach has the drawback that heavily obfuscated apps can escape the static filter, thus avoiding to be dynamically analyzed. Finally, Chen et al. proposed RansomProber, a purely dynamic ransomware detector which employs a set of rules to monitor different aspects of the app execution, such as the presence of encryption or anomalous layout structures. The attained results report a very high accuracy, but the system has not been publicly released yet (to the best of our knowledge).
Table 1 shows a comparison between the state-of-the-art methods for specifically detecting or analyzing Android ransomware. It is possible to observe that there is a certain balance between static- and dynamic-based methods. Some of them also resort to Machine-Learning to perform classification. Notably, only HelDroid and GreatEatlon are currently publicly available.
Concerning generic malware-oriented detectors, Arp et al. proposed Drebin, a machine learning system that uses static analysis to discriminate between generic malware and trusted files. They extracted various features from both the Manifest file and the Android executable, including IP addresses, suspicious API calls, permissions, and so forth. rieck14-drebin . Tam et al. introduced a system to perform dynamic analysis and detection of Android malware by analyzing the system calls performed by the application tam15-ndss . Avdieenko et al. used taint analysis to detect anomalous flows of sensitive data, a technique that allowed to detect novel malware samples without previous knowledge avdieenko15-icse . Yang et al. analyzed malicious apps by defining and extracting the context related to security-sensitive events. In particular, the authors defined a model of context based on two elements: activation conditions (i.e., what makes specific events occur) and guarding conditions (i.e., the environmental attributes of a specific event) yang15-icse .
Aresu et al. clustered Android malware by using the network HTTP traffic generated by those applications aresu15-malcon
. Such clusters can be used to generate signatures that allow discriminating between malware and legitimate applications. Canfora et al. experimentally evaluated two techniques for detecting Android malware: the first one is based on Hidden Markov Model (HMM), and the second one exploits Structural Entropycanfora2016hmm . The attained results showed that both techniques could be suitable for Android malware detection.
Chen et al. proposed StormDroid, a static and dynamic machine-learning based system that extracts information from API-calls, permissions and behavioral features chen16-asiaccs . Finally, Ahmadi et al. proposed IntelliAV, a generic malware-oriented detector that is publicly available. Such a detector provides a level of dangerousness for each app but does not directly specify the family nor the type of attack ahmadi17-cdmake .
Garcia et al. proposed RevealDroid
, a static system for detecting Android malware samples and classifying them in families. The system employs features extracted from reflective calls, native APIs, permissions, and many other characteristics of the file. The attained results showed thatRevealDroid was able to attain very high accuracy, resilience to obfuscation. However, the number of extracted features can be extremely high and depends on the training data garcia18-tosem .
Finally, for the sake of completeness, we mention here other recent works that have analyzed the topic of ransomware detection on X86 (in particular, on Windows platforms) by employing dynamic analysis techniques to perform early detection of the attack. Using such techniques avoid possible damages to the operating system and its files. continella16-acsac ; kharaz16-usenix ; kolodenker17-asiaccs ; huang17-ccs .
We now describe the general structure of systems that employ System API information to identify ransomware, also known as ransomware-oriented detectors. While the majority of learning-based detection systems combine various types of information to detect as many attacks as possible, ransomware-oriented detectors tailor their detection on a smaller set of information (System API) that is typically employed in ransomware. However, as System APIs are also widely used in generic malware and legitimate files, this information type also allows detecting other attacks that differ to ransomware. In this way, it is possible to create a powerful, wide-spectrum detector that features a much lower complexity in comparison to other approaches. Typically, such systems take as input an Android application, analyze it and return three possible outputs: ransomware, generic malware or trusted. The analysis is performed in three steps:
Pre-Processing. In this phase, the application is analyzed to extract its DexCode. The required information is extracted by only inspecting the executable code and does not perform any analysis on other elements, such as the application Manifest. Only specific lines of code, which will be described later in this Section, will be sent to the next module.
Feature Extraction (System API). In this phase, the code lines received from the previous phase are further analyzed to extract the related System API information (either packages, classes, or methods). The occurrence
of such pieces of information is then counted, thus producing a vector of numbers (feature vector) that is sent to a classifier.
Classifier. Classification is carried out through a supervised approach, in which the system is trained with samples whose label (i.e., benign, generic malware or ransomware) is known. Such technique has been used in previous works with excellent results rieck14-drebin ; demontis17-tdsc ; garcia18-tosem
. In particular, our approaches employ Random Forest classifiers, which are especially useful to handle multi-class problems, and which are widely used for malware detection. The complexity of such classifiers depends on the number of trees that compose them. Such a number must be optimized during the training phase.
The structure above is graphically represented in Figure 1. In the following, we provide more details about each phase of the analysis, by focusing in particular on the type of features that can be extracted from the application.
5.1 Preprocessing and Feature Extraction
The general idea of the first two phases is performing static analysis of the Dalvik bytecode contained in the classes.dex file. The goal is retrieving the System API information employed by the executable code of the application. The choice of System API information is related to two basic ideas:
Coherence with actions. Most ransomware writers resort to System APIs to carry out memory- or kernel-related actions (for example, file encryption or memory management). Focusing on user-implemented APIs (as it happens, for example, with Drebin rieck14-drebin ) exposes the system to a risk of being evaded by simply employing different packages to perform actions.
Independence from Training. System API calls are features independent of the training data that are used. As a consequence, it is less likely that applications are not correctly analyzed only because they employ never-seen-before APIs.
Resilience against obfuscation. Using heavy obfuscation routines typically lead to injecting system API-based code in the executable, which can be extracted and analyzed, allowing to detect suspicious files.
Pre-processing is hence easily performed by directly extracting the classes.dex file from the .apk app. Since .apk files are essentially zipped archives, such an operation is rather straight-forward.
Once pre-processing is complete, the classes.dex file is further analyzed by the feature extraction module, which inspects the executable file for all invoke-type instructions (i.e., all instructions related to invocations) contained in the classes.dex code. Then, each invocation is inspected to extract the relevant API information for each methodology, according to a System API reference list that depends on the operating system version (in our case, Android Nougat - API level - a widely-used API set). Only the API elements that belong to the reference list are analyzed. In this paper, we consider three different methodologies, based on, respectively, package, class, and method extraction. If a specific API element is found, its occurrence value is increased by one.
In the following, we provide a more detailed description of the methodologies employed in this paper, by referring to the example reported in Listing 3. The code is parsed in three ways, according to each feature extraction strategy. For each example, we used a very small subset of the employed reference API.
Packages Extraction. In this methodology, we extract the occurrences of the System API packages (a total of reference features), in the same way of our previous work maiorca17-sac . In the example of Listing 3, we used a subset composed of three reference API packages: java/io, java/crypto and java/lang. The four invoke instructions are related to the javax/crypto and java/io packages, which are counted respectively twice. The java/lang package is never used in this snippet. Hence, its value is zero.
Classes Extraction. In this methodology, we extract the occurrences of the System API classes (a total of reference features). Notably, such classes belong to the System API packages of the previous methodology (and, for this reason, their number is significantly higher than packages). In the example of Listing 3, we used a subset composed of two reference API classes: java/io/FileInputStream and javax/crypto/CypherOutputStream, each of them appearing twice.
Methods Extraction. In this methodology, we extract the occurrences of the System API methods (a total of reference features). These methods belong to the System API classes of the previous methodology, leading to a very consistent number of features. This strategy is very similar to other ones employed by other systems (e.g.. rieck14-drebin ; garcia18-tosem ), which have used these features together with user-implemented APIs and other features. In the example of Listing 3, we used a subset composed of four reference API methods: java/io/FileInputStream/read, javax/crypto/CypherOutputStream/flush, javax/crypto/CypherOutputStream/close and java/io/FileInputStream/close. Each API call appears only once. Note that, although there are two methods named close, they belong to two different classes, and they are therefore considered as different methods.
6 Experimental Evaluation
In this Section, we report the experimental results attained from the evaluation of the three API-based strategies. Note that, for the sake of simplicity and speed, we did not run the experiments on Android phones, but on an X86 machine. However, we built a full, working implementation of one of the three approaches, which can be downloaded from the Google Play Store (see next Section).
The rest of this Section is organized as follows: we start by providing an overview of the dataset employed in our experiments. Then, we describe the results attained by four evaluations. The first one aimed to establish the general performances of API-based approaches by considering random distributions of training and test samples. The second one aimed to show how API-based approaches behaved when analyzing samples released after the training data. The third one aimed to show a comparison between our API-based approaches and other systems that employed mixed features. Finally, we evaluated the resilience of API-based approaches against obfuscation techniques and evasion attacks.
In the following, we describe the dataset employed in our experiments. Without considering obfuscated applications (which are going to be discussed in Section 6.3), we obtained and analyzed apps, which are organized in the three categories we mentioned in Section 5.
The 3017 samples used for our ransomware dataset were retrieved from the VirusTotal service444http://www.virustotal.com (which aggregates the detection of multiple anti-malware solutions) and from the HelDroid dataset555https://github.com/necst/heldroid andronio2015heldroid . With respect to the samples obtained from VirusTotal, we used the following procedure to obtain the samples: (i) we searched and downloaded the Android samples whose anti-malware label included the word ransom; (ii) for each downloaded sample, we extracted its family by using the AVClass tool sebastian16-raid , which essentially combines the various labels provided by anti-malware solutions to create a unique label that identifies the sample itself; (iii) we considered only those samples whose family was coherent to ransomware behaviors, or was known to belong to ransomware.
In general, our goal was obtaining a representative corpus of ransomware to ascertain the prediction capabilities of API-based techniques. For this reason, the dataset includes families that perform both device locking (such as Svpeng and LockScreen) and encryption (such as Koler and SLocker). For a better description of the families above, please see Section 3.
6.1.2 Malware and Trusted
We considered a dataset composed of Android malware samples that do not belong to the ransomware category, taken from the following sources: (i) Drebin dataset, one of the most recent, publicly available datasets of malicious Android applications666https://www.sec.cs.tu-bs.de/~danarp/drebin/ (which also contains the samples from the Genome dataset zhou2012dissecting ); (ii) Contagio, a popular free source of malware for X86 and mobile; (iii) VirusTotal. These samples were chosen to verify whether even non-ransomware attacks could be detected with features that are particularly effective at classifying ransomware samples.
In order to download trusted applications, we resorted to two data sources: (i) we crawled the Google Play market using an open-source crawler777https://github.com/liato/android-market-API-py (ii) we extracted a number of applications from the AndroZoo dataset allix16-msr , which features a snapshot of the Google Play store, allowing to access applications without crawling the Google services easily. We obtained
applications that belong to all the different categories available on the market. We chose to focus on the most popular apps to increase the probability of downloading malware-free apps.
6.2 Experiment 1: General Performances
In this experiment, we evaluated the general performances of System API-Based methods (described in Section 5) at detecting ransomware and generic malware. To do so, for each strategy, we randomly split our dataset by , thus using the first half to train the system and the second half to evaluate the system. The number of trees of the random forests was evaluated by performing a -fold cross-validation on the training data. We repeated the whole process
times, and we averaged the results by also determining the standard deviation, in order to understand the dependence of the system on the training data.
Considering the multi-class nature of the problem, we represented the results by calculating the ROC curve for each API-based strategy in two different cases:
Ransomware against benign samples. The crucial goal of our work is detecting ransomware attacks and, more importantly, to avoid them being considered as benign files. A critical mistake would most likely compromise the whole device by locking it or encrypting its data. For this reason, it is essential to verify whether ransomware attacks can be confused with benign samples.
Generic malware against benign samples. Even if System API-based strategies were employed to detect ransomware, they could also be used to classify generic malware (see Section 5). Hence, the goal here is to verify, from a practical perspective, if System API-based information can correctly detect other non-ransomware attacks and distinguish them from legitimate files.
Results are reported in Figure 2. Parts (a) and (b) show the ROC curves that describe the performances attained on ransomware and generic malware detection by the three System API-based methods (packages, classes, methods). By observing these curves, we can deduce the following facts:
All System API-based techniques were able to precisely detect more than of ransomware samples with only of false positives. Because our dataset included a consistent variety of families, we claim that all strategies can detect the majority of ransomware families in the wild. Worth noting, there are no differences in results between using packages, classes or methods. This result means that, concerning general performances, using finer-grained features does not improve detection.
All System API methods featured good accuracy with relatively low false positives (around at , more than at ) at detecting generic malware. While using class-related features did not bring significant improvements to detection, using methods allowed for a improvement for false positive values inferior to .
To better understand the results attained by our strategies, parts (c), (d) and (e) report a ranking of features used by the classifier for each strategy (respectively, packages, methods, and classes), according to their discriminant power. The feature ranking is calculated according to the features Information Gain , given by the following formulation:
where is the overall entropy for the whole dataset and is the average entropy obtained by splitting the set using the attribute . The higher is the gain, the more relevant the feature is.
As a result, note how the information gain for each feature is not so high, meaning that the system does not particularly overfit on specific information and that the final decision is taken by considering a combination of multiple features. At the same time, each feature value is reduced, in comparison to packages, by one magnitude for classes and methods. In other words, using more features allow for distributing the importance of the analyzed information through more elements. This characteristic is two-faced: while it makes the overall behavior of the system less interpretable, it may increase the effort that an attacker has to make to evade the system.
Analyzing the most discriminant methods can give a clearer idea of which information is used to classify applications. Features are related to string building (e.g., the ToString method), Array management (e.g., ArrayList@size, ArrayList@remove), creation of folders (e.g., File@mkdirs), SMS, URI and Layout management, and so forth. These features may be easily associated both to ransomware and malware behavior, and the same behavior is shown on classes and packages.
6.3 Experiment 2: Temporal Performances
In this experiment, we assessed the capabilities of System API-based methods at detecting ransomware samples that were first seen (according to the creation date of the classes.dex executable belonging to each application) after the data that were used as training set. This assessment is useful to understand if, without constant upgrades to its training set, such methods would be able to detect novel, unseen ransomware samples.
For this assessment, we included in the training set (along with all generic malware and trusted samples) ransomware samples that were first seen before a date , and we tested our system on a number of ransomware samples that were released on a date for which (the false positive threshold was set to %). We performed our tests by choosing different values of , where is December st, . Concerning test data, we were able to retrieve only a little amount of samples whose first release date was between January and September . Conversely, we could retrieve a consistent amount of samples whose was November . Hence, we considered three main ranges for : (i) January to September ; (ii) October ; (iii) November .
Results are provided in Figure 3, which shows that by training the system with data retrieved in , class- and method-based strategies can accurately detect ransomware test samples released in . The package-based strategy struggles with the test-set from November , performing significantly worse. However, this is a good result considering the minimal number of features employed by this strategy. Overall, we state that System API-based strategies can predict new ransomware attacks with good accuracy. In this case, using finer-grained features brings a consistent advantage to detection.
6.4 Experiment 3: Comparison with Other Approaches
This section proposes a comparison between System API-based strategies and other state-of-the-art approaches. We were particularly interested in comparing our approach to other publicly available ones, with a special focus on those who were specifically designed to detect ransomware. To this end, we performed a temporal comparison of all systems on the ransomware samples released in (for a total of samples) by using as training (when possible) all data released until .
The state-of-the-art approach that is closest to what we proposed in this paper (while being publicly available888https://github.com/necst/heldroid) is GreatEatlon zheng16-securecomm . Notably, it was not possible for us to control the trained model of the system (it was only possible to choose among a restricted set of classifiers), or to train it with new data. Nevertheless, the system was released in , meaning that data that was first seen in was for sure not included in its training set. Although not specifically tailored to ransomware detection, we also tested the performances of RevealDroid (which is publicly available999https://seal.ics.uci.edu/projects/revealdroid/ garcia18-tosem ) on the same test data. In this case, we could train the system with the same data used in our systems, which allowed us to provide a fairer comparison. Finally, we also tested the performances of the Android version of IntelliAV (available on the Google Play Store) ahmadi17-cdmake . As in GreatEatlon, we could not control the training data of the system. Moreover, as IntelliAV reports three levels of risk for each app (safe, suspicious, risky), we considered as malicious also the files that were labeled as suspicious by the system.
As classifier for GreatEatlon
we chose Stochastic Gradient Descent (SGD), since this was the classifier that best performed on our test samples. ConcerningRevealDroid, we chose the linear SVM classifier, as this was the one that provided the best results in the original work garcia18-tosem . IntelliAV only employed Random Forests. Results are reported in Table 3.
|System API (Methods)|
|System API (Classes)|
|System API (Packages)|
The attained results show that System API-based techniques obtained very similar performances to RevealDroid (which could only, however, classify samples either as malware or benign). Such results are particularly interesting if we consider that RevealDroid extracted a huge number of features (more than ) from multiple characteristics of the file, including native calls, permissions, executable code analysis, which also depended on the training data. With a much simpler set of information, we were able to obtain very similar performance concerning accuracy. This result is especially interesting from the perspective of adversarial attacks, as using fewer features for classification can make the system more robust against them (the attacker can manipulate less information to evade the system) melis18-eusipco . The performances attained by System API-based approaches were also better than IntelliAV, which employed a combination of different features (including permissions, user-defined API, and more). System API-based strategies also performed significantly better than GreatEatlon, which based its detection also on information extracted from strings and language properties. Notably, using methods significantly improved the accuracy performances in comparison to packages and classes, in line to what obtained from Experiment .
6.5 Experiment 4: Resilience against Obfuscation
The goal of this experiment was assessing the robustness of System API-based strategies against obfuscated samples, i.e., understanding whether the application of commercial tools to samples could influence the detection capability of the systems. This evaluation is important, as commercial obfuscation tools are quite popular nowadays since they introduce good protection layers against static analysis (e.g., to avoid pieces of legitimate applications to be copied). Previous works showed that attackers could exploit this aspect by obfuscating malware samples with such tools, thus managing to bypass anti-malware detection maiorca15-cose .
In this experiment, we primarily focused on obfuscated samples whose original (i.e., non-obfuscated) variant was already included in the training set. Such a choice was made because we wanted to assess if obfuscation was enough to influence the key-features of System API-based methods, thus changing the classifiers’ decision for a sample whose original label was malicious.
To this end, we employed a test-bench of ransomware obfuscated with the tool DexProtector101010https://dexprotector.com/, a popular, commercial obfuscation suite that allows for protecting Android applications through heavy code obfuscation. Although such a tool is mostly used for legitimate purposes (e.g., protection of intellectual properties), it can also be used by attackers to make malicious applications harder to be detected. Out of the ransomware samples, we could obfuscate samples (the remaining could not be obfuscated due to errors of the obfuscation software) with three different strategies (for a total of obfuscated samples). The strategies employed to obfuscate samples were the following:
String Encryption. This strategy encrypts strings that are related to const-string instructions, and injects a user-implemented method that performs decryption at runtime.
Resource Encryption. It encrypts the external resources contained in the res and assets folders. To do so, it adds System API information to the classes.dex file, in order to properly manage the encryption routines.
Class Encryption. This strategy encrypts user-implemented classes, and injects routines that allow to perform dynamic loading of such classes.
Figure 4 reports the accuracy attained by the three System API-based strategies against the obfuscated samples. Such results show that all the detection strategies (without significant differences between each other) are resilient against obfuscation attempts. However, Class Encryption deserves separate consideration. This strategy employs heavy obfuscation, and it was explicitly performed to defeat static analysis. Typically, none of the static-based techniques that analyze the executable file should be able to detect such attacks correctly. However, this obfuscation strategy introduces a very regular sequence of System API-based routines that manage runtime decryption of the executable contents.
For this reason, it is sufficient to inject only one sample inside the training set to make all obfuscated samples to be detectable. Hence, we added the mark to Class Encryption. Notably, this may create false positives when legitimate samples are obfuscated with the same strategy. Nevertheless, it is sporadic to find such applications, as Class Encryption strongly decreases the application performances maiorca15-cose , and much simpler obfuscation techniques are generally used.
7 Implementation and Performances
Although many solutions have been proposed in the wild to detect ransomware and generic malware, almost none (with the exception, for example, of ahmadi17-cdmake ) was ported to Android devices, often due to the complexity of the proposed approaches. However, an offline, on-device solution is very useful to perform early detection of applications downloaded, for example, from third-party markets (which are more subjected to malware attacks). For this reason, and also to demonstrate the suitability of System API-based approaches, we ported the simplest of the three strategies (Package-based) with the name of R-PackDroid (as it implements the same approach introduced in our previous work maiorca17-sac ). This implementation scans for any downloaded, installed and updated applications, and it classifies them as ransomware, malware or legitimate. If an application is found as malicious, the user can immediately remove it.
R-PackDroid has been designed to work on the largest amount of devices possible. Hence, during its development, we focused on optimizing its speed and battery consumption. For this reason, we avoided any textual parsing of bytecode lines (which can be attained by transforming the .dex file to multiple .smali files with ApkTool). Therefore, we resorted to DexLib, a powerful parsing library part of the baksmali111111https://github.com/JesusFreke/smali disassembler (and used by ApkTool itself), to directly extract method calls and their related packages. This library allowed to obtain a very high precision at analyzing method calls and significantly reduces the presence of bugs or wrong textual parsing in the analysis phase.
The classification model has been implemented by using Tensorflow121212https://www.tensorflow.org/, an open source, machine learning framework which has been designed to be also used in mobile phones. In particular, we adapted its Random Forest implementation (TensorForest) to the Android operating system. Notably, our Android application only performs classification by using a previously trained classifier. The training phase is carried out separately, on standard X86 architectures. This choice was made to ensure the maximum easiness of use to the final user, thus reducing the risk of invalidating the existing model.
is available for free on the Google Play Store (for the moment, Android versions untilare supported).
7.1 Computational Performances
We analyzed the computational performances of R-PackDroid by running it both on X86 and Android environments. In particular, we focused on extracting the time interval between the .apk loading and the generation of the feature vector for benign samples (grouped by their .apk size)141414The elapsed time to classify a sample, i.e., to read its feature vector and get the final label, is negligible.. The choice of benign samples was made because they are typically more complex to be analyzed in comparison with generic malware and ransomware. We first ran our experiments on a -core Xeon machine with GB of RAM. The attained results, shown in Figure 6, proved that our system could analyze even huge applications in less than seconds.
To evaluate the performances of R-PackDroid on a real Android phone, we ran the same analysis on a Nexus 5, a -years-old, -core device with GB of RAM, equipped with the 6.0.1 version of Android. Results are reported in Figure 7. Even if the analysis times were slower than X86 machines, and even if we were using, in this case, the slowest version of the algorithm, the average analysis time for very large apps was slightly more than seconds. This result was very encouraging, and it showed that R-PackDroid could be safely used even on old phones. The higher dispersion of the time values, in comparison to the ones attained in the previous picture, was possibly caused by the presence of other background processes in the device.
Finally, it is also important to observe that the analysis time is not strictly proportional to the .apk size, as the file may contain additional resources (e.g., images) that increase the .apk size, without influencing the size of the DexCode itself. For this reason, it was not surprising to see the attained average values did not necessarily increase with the .apk size.
8 Discussion and Limitations
Finding 1. System API-based information could be effectively used, alone, to properly distinguish ransomware from generic malware and legitimate applications.
Finding 2. Using finer-grained information (classes and methods), albeit involving more features in the analysis, brought significant improvements to accuracy when detecting previously, unseen samples. Moreover, using API-methods allowed for more accuracy under low false positives values.
Finding 3. System API-based approaches could obtain comparable performances to other approaches that involved more features of different types.
Finding 4. System API-based approaches guaranteed robustness against typical obfuscation strategies such as string encryption. However, by including a few obfuscated samples in the training set, it was also possible to detect heavy, anti-static obfuscation techniques such as class encryption.
Finding 5. System API-based approaches were well suitable to be ported and implemented on mobile devices, with excellent computational performances even on very large applications.
It is also interesting to discuss further the differences between employing a vast set of features (as it happens, e.g., in methods), and a tiny set (as it happens, e.g., in packages). While it is true that using packages could allow achieving similar detection performances in comparison to methods, there may be additional issues in using the first strategy under adversarial perspectives. For example, the feature ranking shown in Section 6.2 demonstrates that a skilled attacker, who may have advanced knowledge of the system (e.g., its most discriminant features), may undermine the detection capabilities of the system by changing a small number of features. Furthermore, adding packages can be significantly more comfortable than adding specific methods, as some of the latter cannot be used without declaring specific parameters (we are not considering, in this case, the possibility of dead code as it can be easily ruled out during the pre-processing phase). We plan to inspect the adversarial aspects of Android ransomware detection (by also employing, for example, attack algorithms such as the one used in biggio13-ecml ) in future work.
Another limitation to point out is that System API-based calls can be theoretically bypassed by an attacker who builds malicious samples by using its routines. However, this requires a very consistent effort, which is most of times not compatible with performing fast and widespread attacks.
It is also worth noting that since Android Oreo (), Google introduced new defenses against background processes that are typical of ransomware (e.g., the ones that directly lock the device). However, this does not exclude other malicious actions on the application level. For this reason, it is always better to have an additional system that can detect attempts at performing malicious actions.
Finally, we also point out that, during our tests, we found samples that could not be analyzed due to crashes and bugs of the DexLib library, and that have therefore been excluded from our analysis. However, their percentage (regarding the whole corpus that we analyzed) is negligible (less than of the whole file corpus).
In this work, we provided a detailed insight into how System API-based information could be effectively used (also on a real device) to detect ransomware and to distinguish it from legitimate samples and generic malware. The attained experimental results demonstrated that, by using a compact set of information tailored to the detection of a specific malware family (ransomware), it was possible to achieve detection performances (also on other malware families) that were comparable to systems that employed a much more complex variety of information. Moreover, System API-based information also proved to be valuable to detect obfuscated samples that focused on hiding user-implemented information. Notably, although it is tempting to combine as many information types as possible to detect attacks (or to develop computationally heavy approaches), it may not be the only, feasible way to construct accurate, reliable malware detectors. For this reason, we claim that future work should focus on developing reliable, small sets of highly discriminant features that cannot easily be manipulated by attackers (with a particular reference to machine learning attacks). Moreover, a clear understanding of the impact of each feature on the classifier detection (also known as explainability) can help analysts understand the classifiers errors and to improve their detection capabilities.
This work was partially supported by the H2020 EU funded project NeCS [GA #675320], by the H2020 EU funded project C3ISP [GA #700294], and by the PISDAS project, funded by the Sardinian Regional Administration (CUP E27H14003150007). The authors also thank Marco Lecis for his valuable contribution to the paper experiments.
-  M. Ahmadi, A. Sotgiu, and G. Giacinto. Intelliav: Toward the feasibility of building intelligent anti-malware on android devices. In A. Holzinger, P. Kieseberg, A. M. Tjoa, and E. Weippl, editors, Machine Learning and Knowledge Extraction, pages 137–154, Cham, 2017. Springer International Publishing.
-  K. Allix, T. F. Bissyandé, J. Klein, and Y. Le Traon. Androzoo: Collecting millions of android apps for the research community. In Proceedings of the 13th International Conference on Mining Software Repositories, MSR ’16, pages 468–471, New York, NY, USA, 2016. ACM.
-  N. Andronio, S. Zanero, and F. Maggi. Heldroid: Dissecting and detecting mobile ransomware. In Recent Advances in Intrusion Detection (RAID), pages 382–404. Springer, 2015.
-  M. Aresu, D. Ariu, M. Ahmadi, D. Maiorca, and G. Giacinto. Clustering android malware families by http traffic. In 2015 10th International Conference on Malicious and Unwanted Software (MALWARE), pages 128–135, Oct 2015.
-  D. Arp, M. Spreitzenbarth, M. Hübner, H. Gascon, and K. Rieck. Drebin: Efficient and explainable detection of android malware in your pocket. In Proc. 21st Annual Network & Distributed System Security Symposium (NDSS). The Internet Society, 2014.
-  V. Avdiienko, K. Kuznetsov, A. Gorla, A. Zeller, S. Arzt, S. Rasthofer, and E. Bodden. Mining apps for abnormal usage of sensitive data. In Proceedings of the 37th International Conference on Software Engineering - Volume 1, ICSE ’15, pages 426–436, Piscataway, NJ, USA, 2015. IEEE Press.
-  B. Biggio, I. Corona, D. Maiorca, B. Nelson, N. Šrndić, P. Laskov, G. Giacinto, and F. Roli. Evasion attacks against machine learning at test time. In H. Blockeel, K. Kersting, S. Nijssen, and F. Železný, editors, European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML PKDD), Part III, volume 8190 of Lecture Notes in Computer Science, pages 387–402. Springer Berlin Heidelberg, 2013.
-  G. Canfora, F. Mercaldo, and C. A. Visaggio. An hmm and structural entropy based detector for android malware: An empirical study. Computers & Security, 61:1–18, 2016.
-  J. Chen, C. Wang, Z. Zhao, K. Chen, R. Du, and G.-J. Ahn. Uncovering the face of android ransomware: Characterization and real-time detection. IEEE Trans. on Information Forensics and Security (TIFS), 13(5):1286–1300, May 2018.
-  S. Chen, M. Xue, Z. Tang, L. Xu, and H. Zhu. Stormdroid: A streaminglized machine learning-based system for detecting android malware. In Proceedings of the 11th ACM on Asia Conference on Computer and Communications Security, ASIA CCS ’16, pages 377–388, New York, NY, USA, 2016. ACM.
-  A. Cimitile, F. Mercaldo, V. Nardone, A. Santone, and C. A. Visaggio. Talos: no more ransomware victims with formal methods. International Journal of Information Security, pages 1–20, 2017.
-  A. Continella, A. Guagnelli, G. Zingaro, G. De Pasquale, A. Barenghi, S. Zanero, and F. Maggi. Shieldfs: A self-healing, ransomware-aware filesystem. In Proceedings of the 32Nd Annual Conference on Computer Security Applications, ACSAC ’16, pages 336–347, New York, NY, USA, 2016. ACM.
-  A. Demontis, M. Melis, B. Biggio, D. Maiorca, D. Arp, K. Rieck, I. Corona, G. Giacinto, and F. Roli. Yes, machine learning can be more secure! a case study on android malware detection. IEEE Trans. Dependable and Secure Computing, In press.
-  J. Garcia, M. Hammad, and S. Malek. Lightweight, obfuscation-resilient detection and family identification of android malware. ACM Trans. Softw. Eng. Methodol., 26(3):11:1–11:29, Jan. 2018.
-  A. Gharib and A. Ghorbani. Dna-droid: A real-time android ransomware detection framework. In Z. Yan, R. Molva, W. Mazurczyk, and R. Kantola, editors, Network and System Security: 11th International Conference, NSS 2017, Helsinki, Finland, August 21–23, 2017, Proceedings, pages 184–198, Cham, 2017. Springer International Publishing.
-  J. Huang, J. Xu, X. Xing, P. Liu, and M. K. Qureshi. Flashguard: Leveraging intrinsic flash properties to defend against encryption ransomware. In Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security, CCS 2017, Dallas, TX, USA, October 30 - November 03, 2017, pages 2231–2244, 2017.
-  A. Kharaz, S. Arshad, C. Mulliner, W. Robertson, and E. Kirda. UNVEIL: A large-scale, automated approach to detecting ransomware. In 25th USENIX Security Symposium (USENIX Security 16), pages 757–772, Austin, TX, 2016. USENIX Association.
-  E. Kolodenker, W. Koch, G. Stringhini, and M. Egele. Paybreak: Defense against cryptographic ransomware. In Proceedings of the 2017 ACM on Asia Conference on Computer and Communications Security, ASIA CCS ’17, pages 599–611, New York, NY, USA, 2017. ACM.
-  D. Maiorca, D. Ariu, I. Corona, M. Aresu, and G. Giacinto. Stealth attacks: An extended insight into the obfuscation effects on android malware. Computers & Security, 51(C):16–31, June 2015.
-  D. Maiorca, F. Mercaldo, G. Giacinto, C. A. Visaggio, and F. Martinelli. R-packdroid: Api package-based characterization and detection of mobile ransomware. In Proceedings of the Symposium on Applied Computing, SAC ’17, pages 1718–1723, New York, NY, USA, 2017. ACM.
-  M. Melis, D. Maiorca, B. Biggio, G. Giacinto, and F. Roli. Explaining black-box android malware detection. In 26th European Signal Processing Conference, EUSIPCO 2018, Roma, Italy, September 3-7, 2018, pages 524–528, 2018.
-  F. Mercaldo, V. Nardone, A. Santone, and C. A. Visaggio. Ransomware steals your phone. formal methods rescue it. In International Conference on Formal Techniques for Distributed Objects, Components, and Systems, pages 212–221. Springer, 2016.
-  M. Sebastián, R. Rivera, P. Kotzias, and J. Caballero. Avclass: A tool for massive malware labeling. In Recent Advances in Intrusion Detection (RAID), volume 9854 of Lecture Notes in Computer Science, pages 230–253. Springer, 2016.
-  S. Song, B. Kim, and S. Lee. The effective ransomware prevention technique using process monitoring on android platform. Mobile Information Systems, 2016.
-  Symantec. Internet security threat report vol. 23, 2018.
-  K. Tam, S. J. Khan, A. Fattori, and L. Cavallaro. Copperdroid: Automatic reconstruction of android malware behaviors. In NDSS. The Internet Society, 2015.
-  T. Yang, Y. Yang, K. Qian, D. C.-T. Lo, Y. Qian, and L. Tao. Automated detection and analysis for android ransomware. In 2015 IEEE 7th International Symposium on Cyberspace Safety and Security (CSS), pages 1338–1343. IEEE, 2015.
-  W. Yang, X. Xiao, B. Andow, S. Li, T. Xie, and W. Enck. Appcontext: Differentiating malicious and benign mobile app behaviors using context. In Proceedings of the 37th International Conference on Software Engineering - Volume 1, ICSE ’15, pages 303–313, Piscataway, NJ, USA, 2015. IEEE Press.
-  C. Zheng, N. Dellarocca, N. Andronio, S. Zanero, and F. Maggi. Greateatlon: Fast, static detection of mobile ransomware. In SecureComm, volume 198 of Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering, pages 617–636. Springer, 2016.
-  Y. Zhou and X. Jiang. Dissecting android malware: Characterization and evolution. In IEEE Symposium on Security and Privacy, pages 95–109. IEEE, 2012.