Efficient Black-box Optimization of Adversarial Windows Malware with Constrained Manipulations

03/30/2020 ∙ by Luca Demetrio, et al. ∙ 0

Windows malware detectors based on machine learning are vulnerable to adversarial examples, even if the attacker is only given black-box access to the model. The main drawback of these attacks is that they require executing the adversarial malware sample in a sandbox at each iteration of its optimization process, to ensure that its intrusive functionality is preserved. In this paper, we present a novel black-box attack that leverages a set of semantics-preserving, constrained malware manipulations to overcome this computationally-demanding validation step. Our attack is formalized as a constrained minimization problem which also enables optimizing the trade-off between the probability of evading detection and the size of the injected adversarial payload. We investigate this trade-off empirically, on two popular static Windows malware detectors, and show that our black-box attack is able to bypass them with only few iterations and changes. We also evaluate whether our attack transfers to other commercial antivirus solutions, and surprisingly find that it can increase the probability of evading some of them. We conclude by discussing the limitations of our approach, and its possible future extensions to target malware classifiers based on dynamic analysis.



There are no comments yet.


page 1

page 2

page 13

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

Machine learning techniques are becoming ubiquitous in the field of computer security. Both academia and industry are investing time, money and human resources to apply these statistical techniques to solve the daunting task of malware detection. In particular, Windows malware is still a threat in the wild, as thousands of malicious programs are uploaded to Virus Total every day.111https://www.virustotal.com/it/statistics/ Modern approaches use machine learning to detect such threats at scale, leveraging many different learning algorithms and feature sets [saxe2015deep, kolosnjaji2016deep, hardy2016dl4md, david2015deepsign, incer2018adversarially, anderson2018ember, raff2017malware].

While these techniques have shown promising malware-detection capabilities, they have not been originally designed to deal with non-stationary, adversarial problems in which attackers can manipulate the input data to evade detection. This has been widely demonstrated in the last decade by work in the area of adversarial machine learning [huang2011adversarial, biggio18]. This research field studies the security aspects of machine-learning algorithms under attacks staged either at training or at test time. In particular, in the context of learning-based Windows malware detectors, it has been shown that it is possible to carefully optimize adversarial malware samples against the target system to bypass it [demetrio2019explaining, kolosnjaji2018adversarial, kreuk2018adversarial, castro2019armed, labaca-castro2019aimed, anderson2017evading, rosenberg2018generic, hu2017generating].

Many of these attacks have been demonstrated in the black-box setting in which the attacker has only query access to the target model [labaca-castro2019aimed, anderson2017evading, rosenberg2018generic, hu2017generating]. This really questions the security of such systems when deployed as cloud services, as they can be queried by external attackers who can in turn optimize their manipulations based on the feedback provided by the target system, until evasion is achieved.

These black-box attacks are however still not very efficient in terms of number of queries, complexity of their optimization process, and number of manipulations performed on the input sample. In particular, the optimization process is overly complex as such attacks require executing the adversarial malware sample in a sandbox during each iteration to ensure that its intrusive functionality is preserved. This verification step is required for attacks that may either manipulate data in feature space (rather than considering realizable input modifications [tong2019improving]), or consider input transformations that may also break the functionality of the malware sample [xu2016automatically, labaca-castro2019aimed]

. To preserve the malicious behavior of the malware sample, the attacker must indeed edit the content of the executable without altering its original semantics, i.e. by preserving all the rules imposed by the structure of the file format (e.g. adding sections, padding bytes, and perturbing particular header fields).

From the perspective of adversarial machine learning, another limitation is that these attacks achieve evasion by significantly manipulating the content of the input malware. Conversely, adversarial attacks should not only be successful, but also require minimal changes. In fact, they often optimize a trade-off between the probability of misclassification and the amount of changes performed on the input sample, which is crucial to understand and quantify the robustness properties of the learning algorithm under attack [biggio18].

In this paper, we consider two popular learning-based windows malware detectors, built on features extracted from static code analysis (as described in Sect. 

II). We aim to overcome the aforementioned limitations by proposing a novel black-box attack (Sect. III) that enables crafting adversarial malware samples that are both realizable and efficient to optimize. Our attack leverages a set of semantics-preserving, constrained malware manipulations to overcome the computationally-demanding validation step required by recent attacks. It is formalized as a constrained minimization problem which optimizes the trade-off between the probability of evading detection and the size of the injected adversarial payload, via a specific regularization term, while also bounding the maximum number of queries to the target system.

Our empirical evaluation (Sect. IV) investigates this trade-off empirically, and shows that our black-box attack is not only able to efficiently bypass the considered learning-based malware detectors, but that this is achieved after only few iterations and changes. We also evaluate whether our attack transfers to other commercial antivirus solutions, and surprisingly find that it increases the probability of evading some of them. We discuss how related work differs from ours in Sect. V, and acknowledge the limitations of our work in Sect. VI. We conclude by discussing possible future extensions of this work (Sect. VII), including how to extend it to target malware classifiers based on dynamic analysis.

Ii Programs and Malware Detection

Programs are represented on disk on a particular format, called Windows Portable Executable (PE).222https://docs.microsoft.com/en-us/windows/win32/debug/pe-format All Windows executable programs comply with this format, which explains to the operating system (OS) how to load the executable in memory before execution. The Windows PE format consists of several components, shown in Figure 4:

Fig. 1: The Windows PE file format.444https://en.wikipedia.org/wiki/Portable_Executable under Attribution 4.0 International (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/)Each colored section describes a particular characteristic of the program.
  • the DOS Header, which contains metadata for loading the executable inside a DOS environment, and the DOS stub, that contains few instructions that will print “This program cannot be run in DOS mode” if executed inside a DOS environment. These two components have been kept to maintain compatibility with older Microsoft’s operating system. From the perspective of a modern application, the only relevant portions present inside the DOS Header are: (i) the magic number MZ, a two-byte long signature for the file, and (ii) the four-byte long integer at offset 0x3c, that works as a pointer to the real header. If one of these two values is scrambled for some reason, the program is considered corrupted, and it will not be executed by the OS;

  • the PE Header, which contains the magic number PE and the characteristics of the executable, such as the target architecture that can run the program, the size of the header and the attributes of the file;

  • the Optional Header contains the information needed by the OS for initializing the loading program. It also contains offsets that point to useful structures, like the Import Address Table (IAT), needed by the OS for resolving dependencies, the Export Table offset, which indicates where to find functions that can be referenced by other programs, and more;

  • the Section table, that is a list of entries that indicates the characteristics of each core component of the program, like the code section (.text), initialized data (.data), relocations (.reloc) and more.

The structure of an executable program can be useful for statically inferring information about its behavior. Indeed, most antivirus vendors apply static analysis to detect threats in the wild, without executing suspicious programs inside a controlled environment. This approach saves time and resources, since the antivirus programs do not execute the suspicious software inside the host OS. Static analysis serves as a first line of defense, and its performance is crucial for opposing the countless threats in the wild. We decided to take into account two particular machine learning detectors proposed in the state of the art that have been coded, trained and released on GitHub with courtesy of EndGame©.555https://www.endgame.com/    MalConv:

is an end-to-end convolutional neural network (CNN) proposed by Raff et al. 

[raff2017malware]. It takes as input the first 2 MB of an executable and returns the probability of being a malware. If the input executable length exceeds this threshold, the file is truncated to fit the specified size, otherwise, the file is padded with the value 0. Since the padding value should be unique, all values are shifted by one to maintain this distinction. Each byte is embedded inside a mathematical space with eight dimensions. Since the input size is fixed a priori, each sample is padded with a special value to fit the decided size, or it is truncated if it exceed that amount. The embedding is learnt by the network, as it is needed to impose a metric over bytes. The convolutional layers are used to correlate spatially distant bytes inside the input binary, i.e. jumps, function calls, which metadata describes which section, and more. The threshold for deciding whether the input executable is malware or not is set to 0.5.    EMBER:

is an open source dataset of benign and malware along with a classifier trained on top of this data source, proposed by Anderson et al. 


. The model is implemented with Gradient Boosting Decision Trees (GBDT) 


. Differently from MalConv, the authors extract many static features from the input binary, computing a 2,381 feature vector. In particular, they consider:

  • general file information, such as the virtual size of the file, number of imported and exported functions, the presence of particular sections, like debug sections, signature and many others;

  • header information, that takes into account the characteristics of the executable, the target architecture, the version and more;

  • the byte histogram, that counts the occurrences of the bytes, normalized w.r.t. the length of the sample;

  • the byte-entropy histogram, which accounts the entropy of the byte distribution inside the file, by applying a sliding window over the binary. The result is a bi-dimensional histogram, inspired to the work proposed by Saxe et al. [saxe2015deep], flattened in a vector and normalized after the process;

  • information taken from strings that are at least five printable characters long. Each string has attached some other information, such as the number of occurrences, how many special markers are contained, such as C:\, HKEY or http and https;

  • section information, that are properties extracted from the sections that compose the binary. In particular, they extract the name, the length, the entropy and the virtual size;

  • the imported / exported functions, that are the functions required for running the program and the ones that are offered to the other executables at run-time. Each function is described as library:function_name.

Many of these feature sets are compressed inside an histogram by applying the hashing trick [moody1989fast], to reduce the dimension of the problem to a smaller and manageable space. From now on, we refer to the GBDT model trained on EMBER dataset as EMBER. While the weaknesses of MalConv are known [demetrio2019explaining, kolosnjaji2018adversarial, kreuk2018adversarial], even EMBER is vulnerable to byte appending, as demonstrated by a challenge held by EndGame.©666https://towardsdatascience.com/evading-machine-learning-malware-classifiers-ce52dabdb713 The method described by the author of the write-up has two issues:

  • the content added to the samples is taken from a single file, and repeated multiple times inside the binary;

  • if the evasion is not achieved, the algorithm tries to append 1 KB of content that most decreases the score, thereby enlarging the payload even further and querying the detector hundreds of times.

Of course, the author of the write-up considered a white-box attack with direct access to the classifier, without taking into account any constraints, but he applied feasible transformations to the input malware. From a security analyst perspective, this is comparable to a vulnerability of the system, and counter-measures are necessary to patch these weaknesses. From an adversarial machine learning perspective, we want to understand how much effort the attacker needs to apply for evading such detectors. This write-up serves as a background for our analysis against EMBER, as we want to highlight that this model can be over-thrown with smaller, carefully-crafted perturbations. This noise should be applied following a strategy that can be repeated on every sample the attacker owns. Moreover, this strategy should be general enough to tackle the constraints that the attacker must satisfy, exploiting feasible manipulations to reduce the overhead induced by the verification step.

Iii Black-Box Optimization of Adversarial Windows Malware

In the malware detection domain, where industries do not fully release the techniques they have developed, having white-box access to the target classifier is a strong assumption. However, this is required if one aims to craft efficient gradient-based attacks against it [biggio18]. To overcome this issue, the adversary may learn a surrogate model by querying the target, and attack the surrogate model with a (white-box) gradient-based attack. In many cases, the attacks optimized against the surrogate model have been shown to successfully transfer to the target model [papernot2017practical, goodfellow6572explaining, carlini2017towards].

Even if adversarial examples may be efficiently optimized via gradient-based methods, training a surrogate model may turn out to be a daunting task, as the attacker needs:

  • to choose a differentiable model

    , as the algorithms for computing adversarial examples require the computation of the gradient. Depending on the choice, training can be expensive in terms of time and number of machines to use. For instance, deep neural networks need tons of data to gain good generalization properties. The use of GPU devices is recommended for speeding up the computations. On top of that, the choice of the architecture matters, as it requires further analysis regarding which details plug inside the network, how many layers, how many neurons per layer, which activation function, and so on. If the attacker relies on standard statistical algorithms, e.g. SVM, it would need machines equipped with high amount of RAM, as the algorithm must deal with giant matrices. Even if the attacker possesses enough computing power, it is still a time-consuming and prone-to-error process;

  • to address the presence of a feature extractor, as the victim target model may extract information from the input samples. This process is usually non-invertible, as it shrinks, compresses and aggregates information inside a mathematical space: given a feature vector, it is very difficult or impossible to re-create the original sample. Even if the model itself is differentiable and the attacker can understand which feature should be modified to change the confidence, it has control only on the real sample. Applying transformation on it might change the desired vulnerable feature, alongside many other, leading the attack to fail. There is no direct mapping between the source and the feature vector, and the attacker can’t edit that representation as it likes.

Since these issues may be difficult to solve, the attacker may want to opt for a black-box strategy. In this context, no gradient method can be applied, but the adversary can ignore the implementation details of the target victim, relying only on the feasible manipulations he can apply on the binary representation of the input malware. To this extent, he needs to formalize how these transformation can be applied, and he needs an intelligent strategy for deciding which one must be applied for landing successful evasion attacks. However, since this is a black-box strategy, the attacker needs to query the victim system multiple times, either to understand how it responds to the manipulations and to decide how to create adversarial examples. The target is likely to be deployed on a remote server, and each query produce network traffic. Each time the attacker applies a transformation on a sample, a request is sent to the target, containing the new version of the malware. In this context, the attacker wants to be effective, by performing the lowest amount of queries, and sending the smallest programs as possible. The attacking strategy should be able to tackle all these constraints, by weighting them and proposing a sub-optimal solution that is suitable for the attacker’s desire.

To this end, in this work we develop a novel black-box attack, named GAMMA (Genetic Adversarial Machine learning Malware Attack), which encapsulates both the evasion strategy and the constraints posed by the attacker. Figure 2 shows the outline of GAMMA, whose components will be discussed separately in the following sections. The chapter is structured as follows: (i) Section III-A will introduce to the reader the mathematical formalization of the attack itself and how to balance between the constraints, (ii) Section III-B will introduce the feasible manipulations and why they preserve the semantics of an input program, while (iii) Section III-C will introduce the strategy that the attacker wants to carry on for crafting adversarial malware examples, in a black-box fashion.

Fig. 2: Outline of GAMMA: given a fixed set of benign sections, the algorithm generates payloads by extracting contiguous chunks of bytes from them. For each payload, GAMMA  generates adversarial malware that is passed to the input detector, whose output is a probability of being malicious. The objective function is the sum of the score mentioned before and a penalty term, controlled by an input regularization parameter. After having sent queries, GAMMA  outputs the best adversarial malware found so far.

Iii-a Formalizing the attack

To quantify the skills of the attacker, we introduce the notion of the attack feature space. Intuitively, a point in this set corresponds to a sequence of actions that can be performed by the adversary without breaking the semantics of the input sample. In this scenario, the order of application of such actions is assumed to be in-influential to the result of the attack. Each action is feasible: the application will change the representation and the correspondent feature vector of the sample, but it will not alter its behavior at run-time. Since the semantics is the same, the attacker would not need to test the sample inside a controlled environment, saving time and resources.    Formalization: let be the set of all possible binary programs. We can describe a program as a string of bytes, hence

that is the language containing all strings with arbitrary length obtained by concatenating elements in that set. Let


be the set containing the capabilities of the adversary, whose dimension is parametric over , which specifies the number of actions that the attacker can combine. Each element is a (sparse) vector whose entries are either or . In this formalization, means that the i-th transformation is applied to the sample, while no action is performed otherwise.

The attacker also need a method for applying these actions to the sample, because the goal is to produce functioning malware from of the original one. Let


be a function that constructs a new program, injecting the manipulations described by the attack feature vector inside the input program. Let


be a function that attributes a malicious score to an input program. This is the classifier we consider during the attack. We assume nothing regarding the internals of this function, except for the output value, which is a continuous probability score. The goal of the attacker is finding a particular combinations of manipulations that lower the confidence attributed by the target detector under the detection threshold.    Attack as an optimization problem: The attacker must take into account both the amount of queries sent to the detector and the amount of manipulations applied to the malware sample . Since we deal with binary programs, measures the amount of bytes that are added to the input malware. Optimizing this quantity implies finding a set of actions that is both stealthy and fast to compute, since the number of queries is kept reasonably low. We denote the maximum number of queries sent to the detector.

While the number of requests can be tackled by fixing a limit on the number of queries, we can treat the size constraint as a penalty term, proportional to the number of added bytes, to the quantity to minimize. This term acts as a regularization term, encompassing both the reduction of the score attributed by the classifier and the number of bytes that are used to compute the adversarial example.

The problem can be written as:


where is a parameter that controls the importance of the size constraint over the problem of finding evasive samples, and is the injection function.By tuning this parameter we ask for solutions with varying size of the payload. The problem is a discrete constraint minimization problem, since the definition of the attack feature space . This implies the use of black-box optimization algorithms that are able to take into account discrete quantities. We show that the problem can be relaxed to a continuous optimization problem. In particular, we apply feasible transformations that can be decomposed, breaking the limitation posed by the discreteness of the attack feature space.

Iii-B Realizable attacks

Working with the file representation is daunting, since one single byte out of place can break the whole program. In the context of Windows PE binaries, there are only transformations that can be applied without altering the semantics of a program. In particular, the most meaningful one are:

  • appending an overlay, which means adding bytes at the end of the file.

  • adding a section, by creating also an entry inside the section table. Each section entry is 40 bytes long, so all content must be shifted of that amount, without messing with both file and section alignments specified by the header.

  • adding import functions, by crafting an entry for the Import Address Table, that specifies which function of which library must be included during the loading process;

  • filling slack space between sections, that is inserted by the compiler to maintain the alignments inside the file. These bytes are usually set to zero, and they are never referenced by the code of the executable;

  • packing, by encrypting or encoding the content of the binary inside another binary and decoding it at run-time. The effect of a packer is invasive, since the whole structure of the input sample is modified.

Motivated by the write-up of the challenge, we decided to consider the appending of an overlay. This mutation is trivial to apply, as it does not require any specific tool, and it is effective against both EMBER and MalConv. The content that will be added at the end of the input malware samples will be taken directly from sections belonging to legit executable. Intuitively speaking, we believe that the addition of content taken from goodware programs would trick the detector in computing the probability of being a malware. In this context, each dimension of the attack feature space is the content of a benign section: the parameter of express the number of benign sections that the attacker wants to use during the attack. The benign content can be added in chunks of variable size, which is helpful for relaxing the problem to the continuous domain.    Continuous relaxation: The problem can be formulated as a continuous constraint minimization problem, by allowing the use of real numbers inside the attack feature space. We re-define the attack feature space as , and each entry is a real number between 0 and 1. Since we are adding bytes, we crop the content of the i-th section by taking only a fraction expressed by of the original content when we add the section to the input malware. To clarify this concept, if , the algorithm will take the 40% of the content of -th section. The problem is now continuous, and it can be solved using all algorithms that do not compute derivatives of the black-box function. In this context, the byte-constraint function can be written as


where is a vector containing the number of bytes of each section. This product is needed since not all sections share the same length, and they must be penalized accordingly. The constraint is expressed as a dot product. Since both the vector of sizes and the attack feature vector possess only positive entries, the application of the dot product is equal to considering the norm, which impose sparsity over the solution.

Iii-C Black-box optimization algorithm

To compute adversarial malware examples while ignoring the implementation details of the target detector, we need a black-box optimization algorithm. An objective function is used to measure the distance from the optimal solution. The technique we use for conducting our experiments is a genetic algorithm, which generates new samples using the previous one as parents. This strategy have been explored and proposed also by other work in the state of the art [xu2016automatically, labaca-castro2019aimed].Figure 1 shows the structure of the algorithm.

population size , generations , objective function
Result: best candidate
random points
F (, g())
i 0
while  do
       S selection(S)
       S crossover(S)
       S mutate(S)
       i i+1
Black-box optimization strategy 1 pseudo-code of the genetic algorithm used for crafting payloads to inject into the input malware inside GAMMA.

The best solution is the one with minimal score. The objective function for our problem is formalized in Equation 4. The selection function takes the best elements from the set of candidates, those with the smallest score computed by the objective function. The crossover function swaps the values of a variable inside two feature vectors, while the mutation function applies a random mutation inside a feature vector. At each round, a set of new candidates is produced and evaluated. Before moving to the next round, the initial generation is added again to the candidates, as it may be possible that some samples of the initial population performs better w.r.t. the mutated ones. In this way, the genetic process starts again, selecting the fittest candidate for survival, eventually reaching a local minima for the problem. The algorithm terminates after iterations, and the detector is queried exactly times. Our strategy sets the constraint over the whole amount of queries sent to the detector, that is , so the attacker must chose accordingly. In our context, each mutation is represented by the addition of benign content inside the input malware.

Iv Experiments

We want to show empirical evidence of the intuitions expressed in Section III, as we want to highlight the regularization effects on the crafted attack payload. We ran our experiments against both EMBER and MalConv on a workstation equipped with an Intel® Xeon® CPU E5-2670, with 48 CPU and 128 GB of RAM. The pre-trained version of MalConv presents a slightly different architecture w.r.t. the original formulation: 1 MB of input size and padding value of 256

to avoid the shifting pre-processing part. The network is implemented using the Keras library 

[chollet2015keras]. For the black-box optimizer included in GAMMA, we rely on the Pagmo2 [francesco_biscani_2019_3464510] library. We tested the the attack using a population size of elements, varying the number of generations . We used the default optional parameters offered by Pagmo2 to tune the algorithm, which comprise the choice of the selection, mutation and cross-over functions.

Iv-a Performance on test dataset in absence of attack

Fig. 3: Receiver Operating Characteristic (ROC) curve of both classifiers.

To evaluate the performance of both classifiers in the absence of attack, we collected a set of benign and malware samples.

Fig. 4: Attack performances by varying in the interval , using 300 malware samples as input, compared with the random attack. We sample values, observing the achieved trade-off between size and confidence. The solid lines are computed as a regression over the point of a particular setting of the experiments.

The malware samples were gathered from VirusTotal777https://www.virustotal.com, while the goodware samples were collected by downloading executable programs from GitHub. The results are shown in Figure 3. The threshold chosen for EMBER is 0.8336, which corresponds to an Area Under the Curve (AUC) of 0.98, while for MalConv the threshold is 0.5, that corresponds to an AUC of 0.93. These results are comparable to the description given by the authors of EMBER [anderson2018ember], as both detectors achieve just a slightly lower score w.r.t. what is reported in the paper. Still, they can be both used as a baseline for our analysis.

Iv-B Attack performance

Since the attack is formalized as a regularization problem, we need to verify the effect of both constraints on the problem. The size constraint is encoded into the minimization objective by the regularizer, the request constraint is handled by controlling the number of queries sent to the the detector. Since the attack feature space is parametric over the number of sections the attacker may add, we extract the first 100 .data sections from our goodware dataset, as discussed in Section III-B, that will be used for adding content to the input malware. Figure 4

shows the trade-off between size and confidence using different values for the regularizer and number of queries. We also reported the results of the application of random byte sequences of increasing length, showing that both classifiers are robust to such noise. The average accuracy is still near 99% with no significant variance. Instead, when

is set to large values, the algorithm finds smaller solutions without caring much about the attributed score, as the quantity to minimize is dominated by the penalty term. As the value of decreases, the algorithm finds more evasive samples with bigger payloads. By increasing the number of queries, the attacker explores more solutions in terms of both size and confidence.

Each point in Figure 4 corresponds to the mean confidence and the mean payload size for a specific value of and number of generations used for crafting adversarial examples from a set of 300 malicious programs. The parameter controls the trade-off between size and accuracy. An increase of its value corresponds to a more regularized solution, leading to smaller but less effective payloads. When tends to zero the score attributed by the classifier dominates the penalty associated with the size. The black-box optimizer will thus find evasive payloads without taking into account the size of the solution. The effects mentioned above are enhanced by the number of optimization steps performed by the black-box algorithm. The space of the solutions is explored more by increasing the number of generations of the genetic algorithm, allowing it to craft adversarial examples that satisfies the constraints posed by the fitness function, as shown in Figure 4. It is easy to notice that the plots are shifted towards extreme solutions, transitioning between large-and-evasive payloads to short-and-ineffective ones. Since the threshold chosen for EMBER is high, most of the evasion attacks are, in average, effective and successful. It is still important to notice that the average accuracy does not decrease under a certain bound: many samples are still classified as malicious, while other are not detected anymore with high confidence of being goodware. This might be caused by the discrete partitions of space produced by the GBDT algorithm. The applied perturbations move a sample far enough from its original partition, falling in an adjacent one with zero confidence of being a malware. Such phenomenon is depicted again in Figure 4 by the variance, highlighting the instability of both detectors against adversarial noise. However, since the function learnt by MalConv is continuous and differentiable, the results are smoother w.r.t. EMBER.

Iv-C Temporal analysis

From a temporal point of view, the complexity of GAMMA  is dominated by the time spent querying the detector, that is encapsulated by the invocation of the objective function. Figure 5 shows the speed of GAMMA, highlighting the linearity of the approach, as the time spent optimizing the attack increases linearly with the number of optimization steps. Unsurprisingly, the time spent for computing adversarial malware examples against EMBER is larger than the one spent for MalConv, as the feature extraction step performed by EMBER is more computationally and time demanding. Nevertheless, the total time required to optimize our black-box attack samples remains quite low, as we do not need to iteratively execute any computationally-demanding validation in a sandbox environment.

Fig. 5: The plot represents the time spent for crafting adversarial examples, with varying the number of generations. Each solid line corresponds to the elapsed time of the black-box attack for EMBER and MalConv.

Iv-D Packing effect

(b) MalConv
Fig. 6: Performances of both classifiers against the UPX packer. Each box-plot shows the distribution of the confidences attributed by the classifiers under analysis, while the red horizontal line corresponds to the threshold used by the detector.

Since these classifiers leverage only static features, it is reasonable to ask ourselves whether encrypting the program content is already sufficient to evade detection, without applying all the techniques we have introduced in Section III. Packing is a technique that was introduced for reducing the size of an executable, by applying a compression, encryption or encoding algorithm. Since the effect of a packer completely changes the representation on disk of the program, it has been extensively used by malware vendors to hide their product to the analysts, increasing the difficulty of the reverse engineering study of the sample. In this context, we apply one famous technique, called UPX888https://upx.github.io to 1000 malware and 1000 goodware programs, and we test the evasion rate for both MalConv and EMBER. We show in Figure 5(a) and 5(b) the effectiveness of the UPX packer. It is clear that both detectors attribute a malicious score when the sample is packed, and this is intuitive by looking at the candle plot of the packed goodware programs. Both detectors shift their score through the malware class, while there is only a little change in terms of mean and variance for the packed malware. We believe that a packing technique can be useful for hiding against a statistic detector as long as the algorithm has not been trained on samples packed with that method. On the contrary, given enough samples packed with a technique, the learning algorithm should be able to capture the signature left by the packer itself inside the packed program. For instance, the UPX packer creates two executable sections called UPX0 and UPX1, that contain the extraction code and the original compressed program. It is likely that a learning algorithm would take advantage of the presence of such signatures for discriminating between benign and malicious programs. Evasion is more likely to be achieved using unseen packers, i.e. custom solutions developed by malware vendors themselves. Since writing a packer is difficult, time consuming and prone to error, we believe that adding content instead of perturbing the whole executable is more stealthy and easy to implement.

Iv-E Evaluation on Commercial Products

Fig. 7: Transferring the attack against the antivirus engines offered by VirusTotal. On the left, each point represents the number of malware samples flagged as malicious by a particular antivirus program, before and after having computed the attack on the EMBER classifier. On the right, each point stands for the detection ratio after both attacks.

We are interested in understanding the effect of our methodology evaluated on commercial detectors, as we want to test these small perturbations against these products. In this context, we are not interested in packing the input samples, as we believe that these methods should detect a threat even if the difference between the two version is a small appended payload. The mutations we apply to our malware samples address only the syntactical structure of each program, and we aim to evaluate here if the application of such transformation can pose a threat to other antivirus programs. We expect that most of the commercial solutions will not be affected by such attacks. We rely on the response retrieved by VirusTotal,999https://virustotal.com which is an online interface for many threat detectors. The service offers an API that can be used for querying the system, by uploading samples from remote. We test the performance of our attack by sending 128 malware samples, before and after attaching the adversarial payload to the sample, optimized against the EMBER classifier. To validate the results of the optimization attack, we performed a test using the same data with a random payload of 14 KB attached. Since the attack is optimized against machine-learning classifiers that extract knowledge only from the structure of an input program, we believe that commercial products (using also dynamic analysis) should not be affected by neither our attack strategy or by appending of small chunks of byte chosen at random. Both plots in Figure 7 show the results of this experiment. Each point in Figure 6(a) represents the number of detections achieved by a particular antivirus, before and after the application of the malicious payload. We obtained the following results:

  • there are some antivirus programs that did not detect anything, maybe due to internal VirusTotal timeout. They are the one on the left bottom of the plot;

  • there are products that detected more adversarial versions of the input samples than their original representation. This implies that, for some engines, the attachment of such payload is and indicator for malicious behavior. They are the ones lying above the diagonal;

  • there are engines that are fooled multiple times by the attack we performed against EMBER. They are recognizable as they lie below the diagonal.

By looking at Figure 6(b), we can appreciate that most of the adversarial attacks are more effective than the random attacks. Since each point is computed as a ratio between the number of detections after the attack divided by the number of detections before the attack, the classifiers that fall below the dashed line are fooled more often by the adversarial attack rather than the random one. To sum up, these detectors manifest a weakness against small perturbations as well as the statistical algorithms we used for our analysis, even if the attack is not crafted against them.

V Related work

There are other approaches that differ from GAMMA, as they consider different settings and solutions. There are other work that explore the creation of adversarial examples for information-security detectors, leveraging both gradient-based and black-box algorithms. Moreover, many algorithm are inspired by attacks proposed against classifiers that do not belong to any security-related domain.

V-a Competing approaches to GAMMA

Compared to the work proposed by Castro et al. [castro2019armed, labaca-castro2019aimed], we do not need to validate the malware inside a sandbox, as we include domain knowledge inside the mutation process. Moreover, we did not impose a threshold as a stopping condition for our strategy, as we aim to show how much we can degrade the performance of a target classifier with this black-box algorithm. If we switch to the attacker’s side, we would chose to stop the algorithm when the adversarial malware has reached evasion successfully, saving both computational power and time. Moreover, the authors of these work state that they need approximately almost 4 minutes for creating adversarial malware, using 100 queries. No architecture details have been unveiled. Our methods performs 1,000 queries in the same time, showing how much the functionality-preserving transformations are helpful w.r.t. the performance of the attack. They also do not report which are the most influential mutations that lead to evasion: the latter is crucial, we are dealing with potential vulnerabilities that lies inside statistical algorithms, whose presence is less evident compared to other security breaches.

Anderson et al. [anderson2017evading]

propose a reinforcement learning approach to decide the best sequence of manipulation that leads to evasion. To test the effectiveness of the agent, they also test the application of manipulations picked at random. Results show that the learned policy perform slightly better than the random one. Authors do not report the resulting file size of the adversarial malware: the reinforcement learning method contains actions that enlarge the representation on disk, but it is not clear how and how much. The model they used as a baseline is a primordial version of the EMBER classifier we have analyzed in this work, trained on fewer samples.

Hu et al.[hu2017generating] develop a Generative Adversarial Network (GAN) [goodfellow2014generative] whose aim is to craft adversarial malware that bypass a target classifier. The network learns which API imports should be added to the original sample, that is interesting since the algorithm ignores the semantics of all the objects that are considered, and still it is able to create points that evade the target classifier. However, no real malware is crafted, as that is attack only operates inside the feature space. The result of the GAN serves only as a trace for the attacker, to understand what should be changed inside its malware. In contrast, we create functioning malware, as real samples are generated each time.

A recap of the black-box attacks against Windows malware detectors can be found in Table I, where we compare the techniques we mentioned above with our method. ARMED [castro2019armed] explores the space of the attack using random mutations, all the other techniques focuses on guided optimization. Moreover, ARMED [castro2019armed] and AIMED [labaca-castro2019aimed] requires validation through a sandbox, while all the other techniques do not. MalGAN [hu2017generating] does not create any functioning malware, since the attack is crafted inside the feature space and the manipulations are not propagated back on the real malware samples. The reinforcement learning (RL) algorithm proposed by Anderson et al. [anderson2017evading] does not highlight successful results, as the optimizer is not able to really explore the space of solutions. Both ARMED and AIMED use the transformations introduced by Anderson et al. and it is not clear if the RL algorithm breaks the functionality of the malware it mutates. These techniques presents different results, more investigation is required to determine the stability of the presented results.

No sandbox Functioning malware Feasible manipulations Attack optimization Payload size optimization
ARMED [castro2019armed]
AIMED [labaca-castro2019aimed]
Anderson et al. [anderson2017evading]
MalGAN [hu2017generating]
TABLE I: State of the art of black-box adversarial attacks against Windows malware detectors.

V-B Gradient attacks against malware detectors

Kolosnjaji et al. [kolosnjaji2018adversarial] apply a gradient-based attack against MalConv, by appending bytes to the overlay. Since MalConv is not fully differentiable, the attack takes place inside the embedding space. The gradient allows the attacker to append bytes that most reduce the confidence of the input malware. Experimental results show that the number of padding bytes needed to evade ranges from 2 KB to 10 KB. Demetrio et al. [demetrio2019explaining] fine-tune the attack proposed by Kolosnjaji et al. by generalizing the algorithm w.r.t. the location to apply the perturbations, and showing that the same network can be bypassed by manipulating only 58 bytes inside the DOS header of an executable. Similarly, Kreuk et al. [kreuk2018adversarial] applied the Fast Gradient Sign Method (FGSM) [goodfellow6572explaining] to alter not only padding, but also slack bytes, that are bytes inserted between section to maintain alignment. However, they show that the slack locations are not enough for crafting an adversarial exe malware, and they need to include padding as well. The attack is formulated again in the embedding space, but the real bytes are only computed after the algorithm has found an evading sample. Rosenberg et al. [rosenberg2018generic]

attack a Recurrent Neural Network (RNN) using a black-box strategy 

[papernot2017practical] by reconstructing the target classifier under attack. They fool the proposed RNN by injecting fake API calls at run-time, wrapping the input malware inside another program with the correct sequence of API that needs to be called for producing the sample. While the wrapper they developed is interesting for proposing a different way for mutating a malware, neither code and results have been publicly released by the authors. Suciu et al. [suciu2019exploring] explored a similar approach to Kreuk et al., by applying the FGSM, and compute adversarial payloads to be inserted between sections (if there is available space) and as padding. No code has been released yet for testing the attack.

V-C Black-box attacks against other security-related detectors

Xu et al. [xu2016automatically] propose a fully black-box attack scheme that relies on a genetic algorithm to produce adversarial examples. They show that this approach is suitable for security application, as they successfully create functioning adversarial PDF malware that evade state-of-the-art PDF malware detectors. On the other hand, since this method ignores the internal technical details of the PDF format, the black-box optimization produces both functioning and also broken PDF malware that are lately discarded by the evolution. As shown by the authors, this is very time consuming, as the algorithm needs to learn which manipulations break the original semantics and which not. Laskov et al. [laskov2014practical] show a black-box attack against PDFRate [smutz2012malicious], which is a PDF malware detector. The authors simulate an adversary that have only partial information regarding the classifier under attack. They reconstruct the victim model and they perform gradient attacks against it, transferring them to the real one. The attack injects content inside the PDF malware, successfully fooling both the surrogate and the target.

V-D Generic Black-box attacks in other domains

Papernot et al. [papernot2017practical] show that it is possible to evade an unknown classifier by reconstructing a local surrogate model. The attacker need to query the target classifier to construct a surrogate dataset that will be used for training the model. The adversary computes the attack against the trained local classifier and it tries to transfer the samples to the target as well. This is time consuming as the attacker needs to send many queries to the target system, as all samples inside the surrogate dataset need to be labeled. Ilyas et al. [ilyas2018black] apply the Natural Evolution Strategy (NES) [wierstra2008natural] to reduce the number of queries that are needed to compute the black-box attack, by imposing a distribution over the manipulation and use the ones that allow the algorithm to explore more. Chen et al. [chen2017zoo]

propose the so-called Zeroth Order Optimization (ZOO), that estimate the gradient of the victim classifier by sampling points around the input. This strategy reduce the number of queries, as the search is local and only around a particular point.

V-E Malware detection through machine learning

Detecting malware is a difficult task, and the state-of-the-art thrives of possible solutions to this problem. Saxe et al. [saxe2015deep] develop a deep neural network which is trained on top of a feature extraction phase. The authors consider imports, bytes and strings distributions along with metadata taken from the headers, for a total of 1024 input variables. The method seems promising, as the results show a ROC of 99% on their dataset. However, even if they claim to have released all code and data, no source is available online for further testing or investigation. Kolosnjaji et al. [kolosnjaji2016deep] propose to track which API are called by a malware, capturing the execution trace using the Cuckoo sandbox,101010https://cuckoosandbox.org/ that is a dynamic analysis virtual environment for testing malware. Again, neither source code or pre-trained models have been publicly released. Hardy et al. [hardy2016dl4md] statically extract which API are called by a program, and they train a deep network over this representation. No source code is available for testing. David et al. [david2015deepsign] develop a network that learns new signatures from input malware, by posing the issue as a reconstruction problem. The network infers a new representation of the data, in a end-to-end fashion. These new signatures can be used as input for other machine learning algorithms. The source code for re-constructing the method is available on GitHub, but no pre-trained models are available for evaluation. Incer et al. [incer2018adversarially] try to tackle the issue of adversarially robust classifier, by imposing monotonic constraints over the features used for the classification tasks. Results are interesting, but again neither code or pre-trained models are publicly available.

Vi Limitations and Open Issues

We have shown that an attacker can optimize the injection of benign content into malware programs, achieving evasion against learning-based malware detectors. This approach is clearly not effective against systems that use features computed by dynamically executing the input program. However, even if malware detectors based on different feature representations are not affected by our attack, the attacker may be able, at least in principle, to identify feasible transformations that can alter their input representation. In this case, our attack can be used to optimize such transformations and evade dynamic malware detectors with minimal modifications.

Another limitation of our approach is posed by the nature of the feasible manipulations chosen by the attacker. Since the attack feature space is mathematically formalized as a space of predefined static manipulations, it can not include random dynamic-generated content, such as random byte sequences. To mimic the generation of random content, the attacker should create a set of random sequences that will not change during the optimization of the attack.

Vii Conclusions and Future work

We propose a black-box evasion strategy that considers only feasible manipulations, i.e. transformations applied to the input malware that do not alter its original semantics. We show that it is possible to omit the validation step and speed up the computation of such adversarial examples. We develop a mathematical framework for describing the capabilities of an adversary, the attack feature space. It provides the attacker with a multi-dimensional space, in which each axis corresponds to a particular transformation that the attacker can apply to the input malware. To impose the application of small perturbations, we add a regularization term in our objective that penalizes the injection of larger payloads. This term is calibrated by a regularization parameter that controls the norm of the attack vector. Since this is a black-box attack, the adversary wants to send as few query as possible. To express this constraint, we limit the number of requests sent to the detector by stopping the algorithm after having reached the desired amount of queries.

We successfully land evasion attacks against two well-known baseline detectors using GAMMA, i.e. an implementation of the formalization we have introduced in this work. Results are more accurate as the number of queries sent to the detector increases, since our algorithm is allowed to explore a larger portion of the solution space and thus finds better local optima. The main vulnerability of the two detectors under analysis is posed by the presence of byte-based features, which can be easily manipulated by the attacker. We thus discourage using only such features for the purpose of malware detection.

We also show that this attack transfers on some commercial detectors hosted by VirusTotal, by reducing the number of detected threats w.r.t. the original detection rate. The transfer attack is also more effective than the injection of randomly generated payloads: we hypothesize a direct attack against these detectors might be even more effective, since our strategy will optimize the payloads for these particular systems.

However, this attack only works against static detectors. We would like to explore the world of dynamic analysis, studying how an attacker can manipulate the flow of execution to his advantage, fooling these kind of classifiers in the process. The latter requires a more in-depth study of the loading and execution process of a program: altering the control flow of an executable implies adding code, API imports and function calls without mining the original malicious behaviour. Since many malware samples are packed and highly obfuscated, is not yet clear how and which transformations might be applied safely. To this context, the attacker needs new feasible manipulations suitable to the dynamic analysis scenario.

The combination of both static and dynamic evasion would pose an interesting yet problematic line of research for these new learning detectors, by effectively exposing their weaknesses, and proposing new hardening schemes that can be applied to increase robustness against adversarial malware.


This work was partly supported by the PRIN 2017 project RexLearn (grant no. 2017TWNMH2), funded by the Italian Ministry of Education, University and Research.