Adversarial examples, the artifacts created by test-time evasion attacks against machine-learning (ML) algorithms, have recently emerged as a threat to ML-based systems. For example, adversarial examples can enable attackers to impersonate users that are enrolled in face-recognition systems [91, 92], fool street-sign recognition algorithms into misclassifying street signs , and trick voice-controlled interfaces to misinterpret commands [20, 78, 88].
In particular, adversarial examples are a potential threat to malware detection—a fundamental computer-security problem that is increasingly addressed with the help of ML models (e.g., [4, 55, 80, 99]). In this domain, attackers are interested in altering programs to mislead ML-based malware detectors into misclassifying malicious programs as benign, or vice versa. In doing so, attackers face a non-trivial constraint: in addition to misleading the malware detectors, any alteration of a program must not change its original, intended, functionality. For example, a keylogger altered to evade being detected as malware should still carry out its intended function, including invoking necessary APIs, accessing sensitive files, and connecting to attackers’ servers. This constraint is arguably more challenging than ones imposed by other domains (e.g., evading image recognition without making the changes conspicuous to humans [30, 91, 92]) as it is less amenable to being encoded into traditional frameworks for generating adversarial examples, and most changes to byte values are likely to break a program’s syntax or semantics. In this work, we show that the constraint of preserving functionality can be incorporated into the process of generating adversarial examples to fool state-of-the-art deep neural networks (DNNs) for malware detection [55, 80].
Roughly speaking, malware-detection methods can be categorized as dynamic or static [11, 99]. Dynamic methods (e.g., ) execute programs to learn behavioral features that can be used for classification. In contrast, static methods (e.g., [4, 55]
) classify programs using features that can be computed without execution. While potentially more accurate, dynamic methods are more computationally expensive, and, consequently, less ubiquitously deployed[11, 49]. Therefore, we focus on static methods.
Several attacks have been proposed to generate adversarial examples against DNNs for static malware detection [26, 50, 56, 97]. To fool the DNNs while preserving functionality, these attacks introduce adversarially crafted byte values in regions that do not affect execution (e.g., at the end of programs or between sections). These attacks can be defended against by masking out or removing the added content before classification (e.g., ); we confirm this empirically.
In this paper we show how binary-diversification tools—tools for transforming programs at the binary level to create diverse variants of the same program—that were originally proposed to defend against code-reuse attacks [54, 77] can be leveraged to evade malware-detection DNNs. While these tools preserve the functionality of programs after transformation by design, they are ineffective at evading malware detection when applied naïvely (e.g., functionality-preserving randomization). To address this, we propose optimization algorithms to guide the transformations of binaries to fool malware-detection DNNs, both in settings where attackers have access to the DNNs’ parameters (i.e., white-box) and ones where they have no access (i.e., black-box). The algorithms we propose can produce program variants that often fool DNNs in 100% of evasion attempts. Perhaps most worryingly, we find that the attack samples produced by the algorithms are also often effective at evading commercial malware detectors (in some cases with success rates as high as 85%). Because our attacks transform functional parts of programs, they are particularly difficult to defend against, especially when augmented with methods to deter static and dynamic analyses. We explore potential mitigations to the attacks that we propose (e.g., via preprocessing programs to normalize them before classification [3, 18, 103]), but conclude that attackers may adapt to circumvent these mitigations. This leads us to advocate against relying only on ML-based techniques for malware detection, as is becoming increasingly common .
In a nutshell, the contributions of our paper are as follows:
We propose a novel functionality-preserving attack against DNNs for malware detection from raw bytes (Sec. III). The attack uses binary-diversification techniques in a novel way to prevent defenses applicable to prior attacks, and is applicable both in white-box and black-box settings.
We evaluate and demonstrate the effectiveness of the proposed attack in different settings, including against commercial anti-viruses (Sec. IV). We also compare our attack with prior attacks, and show that it achieves comparable or higher success rates, while being more challenging to defend against.
We explore the effectiveness of prior and new defenses against our proposed attack (Sec. V). While several defenses seem promising to defend against specific variants of the attack, we warn against the risk of adaptive attackers.
Next, we review some background and related work.
Ii Background and Related Work
We start this section with background on DNNs for malware detection, which are the main target of our attacks. Then, we discuss research on attacking and defending ML algorithms generally, and malware detection specifically. We end the section with background on binary randomization and rewriting methods, which serve as building blocks for our attacks.
Ii-a DNNs for Malware Detection
In this work, we study attacks targeting two DNN architectures for detecting malware from the raw bytes of Windows binaries (i.e., executables in Portable Executable format) [55, 80]. The main appeal of these DNNs is that they achieve state-of-the-art performance using automatically learned features, instead of manually crafted features that require tedious human effort (e.g., [4, 51, 44]). In fact, due to their desirable properties, computer-security companies use DNNs similar to the ones that we study (i.e., ones that operate on raw bytes and use a convolution architectures) for malware detection . As these DNNs classify binaries without executing them, they fall under the category of static detection methods [11, 99].
. Yet, in contrast to image-based classifiers that classify inputs from continuous domains, the malware-detection DNNs classify inputs from discrete domains—byte values of binaries. To this end, the DNNs were designed with initial embedding layers that map each byte in the input to a vector in. Once the input is represented in a real vector space after the embedding, standard convolutional and non-linear operations are performed by subsequent layers.
Ii-B Attacking and Defending ML Algorithms
Attacks on Image Classification Adversarial examples—inputs that are minimally perturbed to fool ML algorithms—have emerged as challenge to ML. The majority of attacks in prior work (e.g., [8, 10, 13, 33, 71, 75, 98, 105]) focused on DNNs for image-classification, and on finding adversarial perturbations that have small -norm ( typically ) that lead to misclassification when added to input images. By limiting perturbations to small -norms, attacks aim to ensure that the perturbations are imperceptible to humans. Attacks are often formalized as optimization processes. For example, Carlini and Wagner  proposed the following formulation for finding adversarial perturbations that target a class and have small -norms:
where is the original image, is the perturbation, and is a parameter to tune the -norm of the perturbation. is a function that, when minimized, leads to be (mis)classified as . It is roughly defined as:
where is the output for class
at the logits of the DNN—the output of the one-before-last layer. Our attacks useto mislead the malware-detection DNNs.
Attacks with Complex Objectives Most early attacks reduce the -norms of adversarial perturbations to maintain perceptual similarity between the original images and their corresponding adversarial examples. However, small -norms may be unnecessary or insufficient for maintaining perceptual similarity [29, 90]. Moreover, while reducing the -norms is an objective that can easily integrated into processes for generating adversarial examples, it is unlikely by itself to result in attacks that affect systems in practice. Therefore, follow-up attacks proposed adversarial examples that satisfy objectives other than similarity as measured in -norms. Similarly to the attacks that we explore in this work, which aim to preserve the functionality of transformed binaries, other attacks aim to preserve certain properties of the adversarial artifacts that are critical for their practicality. For example, in the realm of face and street-sign detection and recognition, researchers proposed ways to change the physical appearance of faces and street signs to mislead detection and recognition systems [14, 30, 91, 92]. In the speech-recognition domain, researchers showed how to modify audio signals slightly such that they would mislead speech-recognition systems when played [20, 78, 88]
. Last, researchers also showed that text classification methods (e.g., for sentiment analysis) can be misled while maintaining certain properties of the original text, such as its meaning[63, 76, 93].
Attacks on Malware Detection Multiple attacks were proposed to evade ML-based malware classifiers while preserving the (malicious) functionality of the malware. Some attacks (e.g., [25, 96, 102, 108]) tweak malware to mimic benign files (e.g., adding benign code-snippets to malicious PDF files). Other attacks (e.g., [2, 26, 35, 41, 50, 56, 97]) tweak malware using gradient-based optimizations or generative methods (e.g., to find which APIs to import). A third type of attacks uses a combination of mimicry and gradient-based optimizations .
) which studied attacks against dynamic ML-based malware detectors, we explore attacks that target DNNs for malware detection from raw bytes (i.e., static detection methods). Furthermore, the attacks we explore do not take advantage of weaknesses in the feature-extraction process by introducing adversarially crafted bytes to unreachable regions of the binaries[50, 56, 97] (which may be possible to detect and sanitize statically, see Sec. IV-C), or by mangling bytes in the header of binaries  (which can be stripped before classification ). Instead, the attacks we propose transform the original code of binaries in a functionality-preserving manner to achieve misclassification. Nonetheless, we still compare the evasion success rates of our attack with a representative prior attack (see Sec. IV-C), to ensure that the additional properties are not achieved at the expense of lower evasion success.
More traditionally, attackers use various obfuscation techniques to evade malware detection. Packing [11, 85, 99, 100]—encrypting binaries’ code and data, and then decrypting them at run time—is commonly used to hide malicious content from static detection methods. As we explain later (Sec. III-A) we only consider unpacked binaries in this work, as is often the case for static analysis methods [11, 55]. Attackers also obfuscate binaries by substituting instructions with others or altering the control flow graphs of binaries [17, 16, 45, 99]. We demonstrate that such obfuscation methods do not fool the malware-detection DNNs when applied naïvely (see Sec. IV-B). To address this, our attacks leverage stochastic optimization techniques to guide the transformation of binaries and mislead malware detection.
Perhaps most closely related to our work is the recent work on misleading ML algorithms for authorship attribution [67, 79]. Meng et al. proposed an attack to mislead authorship attribution at the binary level . Unlike the attacks we propose, Meng et al. leverage weaknesses in feature extraction and modify debug information and non-loadable sections to fool the ML models. Furthermore, their method leaves a conspicuous footprint that the binary was modified (e.g., by introducing multiple data and code sections to the binaries). While this is potentially acceptable for evading author identification, it may raise suspicion when evading malware detection. Quiring et al. recently proposed an attack to mislead authorship attribution from source code . In a similar spirit to our work, their attack leverages an optimization algorithm to guide code transformations that change syntactic and lexical features of the code (e.g., switching between printf and cout) to mislead ML algorithms for authorship attribution.
Defending ML Algorithms Researchers in the area of adversarial ML are actively seeking ways to defend against adversarial examples. One line of work, called adversarial training, aims to train robust models largely by augmenting the training data with correctly labeled adversarial examples [33, 47, 46, 58, 64, 98]. Another line of work proposes algorithms to train certifiably (i.e., provably) robust defenses against certain attacks [21, 52, 61, 69, 109]. Unfortunately, these defenses are limited to specific types of perturbations (e.g., ones with small - or -norms). Moreover, they often do not scale to large models that are trained on large datasets. As discussed in Sec. V, amongst other limitations, our evaluation shows that these defenses would also be too expensive to practically mitigate our attacks. Some defenses suggest that certain input transformations (e.g., quantization) can “undo” adversarial perturbations before classification [37, 62, 66, 86, 95, 106, 107]. In practice, however, it has been shown that attackers can adapt to circumvent such defenses [6, 5]. Additionally, the input transformations that have been explored in the image-classification domain cannot be applied in the context of malware detection. Prior work has also shown that adaptive attackers  can circumvent methods for detecting the presence of attacks (e.g., [31, 34, 66, 68]). We expect that such attackers can circumvent attempts to detect our attacks as well.
Prior work proposed ML-based malware-classification methods designed to be robust against evasion [27, 44]. However, these methods either have low accuracy , or target linear classifiers  which are unsuitable for detecting malware from raw bytes.
Fleshman et al. proposed to make malware-detection DNNs more robust by constraining the parameter weights in the last layer to non-negative values . Their approach aims to prevent attackers from introducing additional features to malware to decrease its likelihood of being classified correctly. While this rationale holds for single-layer neural networks (i.e., linear classifiers), DNNs with multiple layers constitute complex functions where the addition of features at the input may correspond to the deletion of features in deep layers. As a result of the misalignment between the threat model and the defense, we found that DNNs trained with this defense are as vulnerable to prior attacks  as undefended DNNs. In contrast, Fleshman et al. report that their defense is effective against prior attacks. The failure of the attacks may have been caused by the obfuscated gradients phenomenon .
Ii-C Binary Rewriting and Randomization
Software diversification is a technique developed to produce diverse binary versions of programs, all with the same functionality, to resist different kinds of attacks, such as memory-corruption, code-injection, and code-reuse attacks . Diversification can be performed at the source-code level (via the development of multiple implementations), at compilation time (e.g., using a multicompiler), or after compilation (by rewriting and randomizing programs’ binaries). In this work, we build on diversification techniques after compilation, at the binary level, as they have wider applicability (e.g., self-spreading malware can use them to evade detection without having access to the source code ), and are more efficient (producing the binary after a transformation does not require recompilation). Nevertheless, we expect that this work can be extended to work with different diversification methods.
There is a large body of work on binary rewriting from the programming-languages, computer-architecture, and computer-security communities (e.g., [38, 54, 53, 65, 77, 87, 104]). Some of the rewriting methods aim to achieve higher-performing code via relatively expensive search through the space of equivalent programs [65, 87]. Other methods significantly increase the size of binaries, or may leave a conspicuous sign that rewriting took place [38, 104]. We build on binary-randomization tools that have little-to-no effect on the size or run time of the randomized binaries, thus helping our attacks remain stealthy [54, 77]. We present these tools and our extensions thereof in the following section.
Iii Technical Approach
This section discusses the technical approach behind our attack. Before delving into the details, we initially lay down the threat model.
Iii-a Threat Model
We assume that the attacker has white-box or black-box access to DNNs for malware detection that receive raw bytes of program binaries as input. In the white-box setting, the attacker has access to the DNNs’ architectures and weights and is able to efficiently compute the gradients of loss functions with respect to the DNNs’ input via forward and backward passes. On the other hand, the attacker in the black-box setting may only query the model with a binary and receive the probability estimate that the binary is malicious.
The weights of the DNNs are fixed and cannot be controlled by the attacker (e.g., by poisoning the training data). The attacker uses binary rewriting methods that are challenging to undo to manipulate the raw bytes of binaries and cause misclassification while keeping functionality intact. Attacks may seek to cause malware to be misclassified as benign or benign binaries to be misclassified as malware. The former type of attack may cause malware to circumvent defenses and be executed on a victim’s machine. The latter may be useful to induce false positives, which may lead users to turn off or ignore the defenses .
We also assume that the binaries are unpacked, as is often the case for static malware-detection methods [11, 55]. Detecting packed binaries and unpacking them are problems orthogonal to ours that have been addressed by other researchers (e.g., [11, 15, 100]). Nonetheless, adversaries may still use our attacks simultaneously with packing: As packed binaries are usually unpacked before being classified by static methods , adversaries can use our attacks to modify binaries before packing them so that the binaries would be misclassified once unpacked.
As is standard for ML-based malware detection from raw bytes in particular (Sec. II-A), and for classification of inputs from discrete domains in general (e.g., ), we assume that the first layer of the DNN is an embedding layer. This layer maps each discrete token from the input space to a vector of real numbers via a function . When computing the DNN’s output on an input binary , one first computes the embeddings and feeds them to the subsequent layers. Thus, if we denote the composition of the layers following the embedding by , then . While the DNNs we attack contain embedding layers, our attacks conceptually apply to DNNs that do not contain such layers. Specifically, for a DNN function for which the errors can be propagated back to the layer, the attack presented below can be executed by defining .
Iii-B Functionality-Preserving Attack
The attack we propose iteratively transforms a given binary of class (=0 for benign binaries, and =1 for malware) until misclassification occurs or a maximum number of iterations is reached. To keep the binary’s functionality intact, the types of transformations are limited to ones that preserve functionality. (The transformation types that we consider in this work are detailed below.) In each iteration, the attack picks a transformation type at random for each function, and attempts to transform the function using it. For instance, if the transformation type can replace certain instructions within a function with functionally equivalent ones, a random subset of those instructions will be selected for replacement. The attempted transformation is applied only if the DNN becomes more likely to misclassify the binary.
Alg. 1 presents the pseudocode of the attack in the white-box setting. The algorithm starts by transforming all the functions in the binary in an undirected way. Namely, for each function in the binary, a transformation type is selected at random from the set of available transformations. The transformation is then applied to that function. When there are multiple ways to apply the transformation to the function, one is chosen at random. The algorithm then proceeds to further transform the binary for up to iterations. Each iteration starts by computing the embedding of the binary to a vector space, , and the gradient, , of the DNN’s loss function, , with respect to the embedding (lines 4–5). The loss function we use is the Carlini and Wagner loss function () presented in Sec. II. Ideally, to move the binary closer to misclassification, we would manipulate the binary so that the difference of its embedding from (for some scaling factor ) is minimized (see prior work for examples [50, 56]). However, if applied without proper care, such manipulation would likely change the functionality of the binary or cause it to become ill-formed. Instead, we transform the binary via functionality-preserving transformations. As the transformation types are stochastic and may have many possible outputs (in some cases, more than can be feasibly enumerated), we cannot estimate their impact on the binary a priori. Therefore, we transform each function, , by attempting to apply a (randomly picked) functionality-preserving transformation type at random (once per iteration); we apply the transformation only if it shifts the embedding in a direction similar to (lines 6–14). More concretely, if is the gradient with respect to the embedding of the bytes corresponding to , and is the difference between the embedding of
’s bytes after the attempted transformation and its bytes before, then the transformation is applied only if the cosine similarity (or, equivalently, the dot product) betweenand
is positive. Other optimization methods (e.g., genetic programming) and similarity measures (e.g., similarity in the Euclidean space) that we tested did not perform as well.
If the input were continuous, it would be possible to perform the same attack in a black-box setting after estimating the gradients by querying the model (e.g., ). In our case, however, it is not possible to estimate the gradients of the loss with respect to the input, as the input is discrete. Therefore, the black-box attack we propose follows a general hill-climbing approach (e.g., ) rather than gradient ascent. The black-box attack is conceptually similar to the white-box one, and differs only in the condition checking whether to apply attempted transformations: Whereas the white-box attack uses gradient-related information to decide whether to apply a transformation, the black-box attack queries the model after attempting to transform a function, and accepts the transformation only if the probability of the target class increases.
Transformation Types In this work, we consider two families of transformation types [54, 77], as well as their combination. For the first family, we adopt and extend the transformation types proposed in the in-place randomization () work of Pappas et al. . Given a binary to randomize, Pappas et al. proposed to disassemble it and identify functions and basic blocks, statically perform four types of transformations that preserve the functionality of the code, and then update the binary accordingly from the modified assembly code. The four transformation types considered are: 1) to replace instructions with equivalent ones of the same length (e.g., sub eax,4 add eax,-4); 2) to reassign registers within functions or a set of basic blocks (e.g., swap all instances of ebx and ecx) if this does not affect code that follows; 3) to reorder instructions, using a dependence graph to ensure that no instruction appears before another one it depends on; and 4) to change the order in which register values are pushed to and popped from the stack to save them across function calls.
To maintain the semantics of the code, the disassembly and transformations are performed conservatively (e.g., speculative disassembly, a disassembly technique that has a relatively high likelihood of misidentifying code, is avoided). does not alter binaries’ sizes and has no measurable effect on their run time .
The original implementation of Pappas et al. could not be used to evade malware detection due to several limitations, including ones preventing it from producing the majority of functionally equivalent binary variants that are conceptually achievable under the four transformation types. Thus, we extend and improve the implementation in various ways. First, we enable the transformations to compose. In other words, unlike Pappas et al.’s implementation, our implementation allows us to iteratively apply different transformation types to the same function. Second, we apply transformations more conservatively to ensure that the functionality of the binaries is preserved (e.g., by not replacing add and sub instructions if they are followed by instructions that read the flags register). Third, compared to the previous implementation, our implementation can handle a larger number of instructions and additional function-calling conventions. In particular, our implementation can rewrite binaries containing additional instructions (e.g., shrd, shld, ccmove) as well as less common calling conventions (e.g., nonstandard returns via increment of esp followed by a jmp instruction) without impacting the binaries’ functionality. Last, we fix bugs in the original implementation (e.g., incorrect checks for writes to memory after reads). Fig. 1 shows an example of transforming code using .
The second family of transformation types that we build on is based on code displacement (), proposed by Koo and Polychronakis . Similarly to , begins by conservatively disassembling the binary. The original idea of is to move code that can be leveraged as a gadget in code-reuse attacks to a new executable section in order to break the gadget. The original code to be displaced has to be at least five bytes in size so that it can be replaced with a jmp instruction that passes the control to the displaced code. If the displaced code contains more than five bytes, the bytes after the jmp are replaced with trap instructions that terminate the program; these would be executed if a code-reuse attack is attempted. In addition, another jmp instruction is appended immediately after the displaced code to pass the control back to the instruction that should follow. Of course, any displaced instruction that uses an address relative to the instruction-pointer (i.e., IP) register is also updated to reflect the new address after displacement. has a minor effect on binaries’ sizes (2% increase on average) and causes a small amount of run-time overhead (1% on average) .
We extend in two primary ways. First, we make it possible to displace any set of consecutive instructions within the same basic block, not only ones that belong to gadgets. Second, instead of replacing the original instructions with traps, we replace them with semantic nops—sets of instructions that cumulatively do not affect the memory or register values and have no side effects . These semantic nops get jumped to immediately after the displaced code is done executing. To create the semantic nops, we use the context-free grammar described in Fig. 2. At a high-level, a semantic nop can be an atomic instruction (e.g., nop), or recursively defined as an invertible instruction that is followed by a semantic nop and then by the inverse instruction (e.g., push eax followed by a semantic nop and then by pop eax), or as two consecutive semantic nops. When the flags register’s value is saved (i.e., between pushfd and popfd instructions), a semantic nop may contain instructions that affect flags (e.g., add and then subtract a value from a register), and when a register’s value is saved too (i.e., between push r and pop r), a semantic nop may contain instructions that affect the register (e.g., decrement it by a random value). Using the grammar for generating semantic nops, for example, one may generate a semantic nop that stores the flags and ebx registers on the stack (pushfd; push ebx), performs an operation that might affect both registers (e.g., add ebx, 0xff), and then restores the registers (pop ebx; popfd).
When using , our attacks start by displacing code up to a certain budget, to ensure that the resulting binary’s size does not increase above a threshold (e.g., 1% above the original size). We first divide the budget (expressed as the number of bytes to be displaced) by the number of functions in the binary, and we attempt to displace exactly that number of bytes per function. If multiple options exist for what code in a function to displace, the code to be displaced is chosen at random. If a function does not contain enough code to displace, then we attach semantic nops (occupying the necessary number of bytes) after the displaced code to meet the per-function budget. In the rare case that the function does not have any basic block larger than five bytes, we skip that function. Fig. 3 illustrates an example of displacement where semantic nops are inserted to replace original code, as well as after the displaced code, to consume the budget. Then, in each iteration of modifying the binary to cause it to be misclassified, new semantic nops are chosen at random and used to replace the previously inserted semantic nops if that moves the binary closer to misclassification.
Some of the semantic nops contain integer values that can be set arbitrarily (e.g., see line 12 of Fig. 2). In a white-box setting, the bytes of the binary that correspond to these values can be set to perturb the embedding in the direction that is most similar to the gradient. Namely, if an integer value in the semantic nop corresponds to the th byte in the binary, we set this th byte to such that the cosine similarity between and is maximized. This process is repeated each time a semantic nop is drawn to replace previous semantic nops in white-box attacks.
Prior work has suggested methods for detecting and removing semantic nops from binaries . Such methods might appear viable for defending against -based attacks, though as we discuss in Sec. V, attackers can leverage various techniques to evade semantic-nop detection and removal.
Limitations While our implementation extends prior implementations, it can still be further improved. For instance, our implementation does not displace code that has been displaced in early iterations. A more comprehensive implementation might apply displacements recursively. Furthermore, the composability of and transformations can be enhanced. Particularly, when applying both the and transformations to a binary, both types of transformations affect the original instructions of the binary. However, does not affect the semantic nops that are introduced by . Although there remains room for improvement, we did not pursue the remaining engineering endeavors because the attacks were successful despite the shortcomings of the implementation.
In this section, we provide a comprehensive evaluation of our attack. We begin by providing details about the DNNs and data used for evaluation. We then show that naïve, random, transformations that are not guided via optimization do not lead to misclassification. Subsequently, we provide an evaluation of variants of our attack under a white-box setting, and compare with prior work. Then, we move to discuss evaluations of our attack in the black-box setting, both against the DNNs and commercial anti-viruses. We close the section with experiments to validate that the attacks preserve functionality.
Iv-a Datasets and Malware-Detection DNNs
To train malware-detection DNNs, we used malicious binaries from a publicly available dataset that we augmented with benign binaries from standard software packages, as is standard (e.g., [51, 56]). In particular, we used malware binaries from nine malware families111Gatak, Kelihos v1, Kelihos v3, Lollipop, Obfuscator ACY, Ramnit, Simda, Tracur, and Vundo. that were published as part of a malware-classification competition organized by Microsoft . This dataset contains raw binaries of malware samples targeting Windows machines. As such, the binaries adhere to the Portable Executable format (; the standard format for .dll and .exe files) . However, to maintain sterility and prevent the binaries from executing, the curators removed their
headers (which, among others, contain the entry points of the code). In total, the dataset contains 21,741 binaries that were partitioned into training and test sets by the dataset curators. We further partitioned the test set randomly into one group for validation (i.e., model and hyperparameter tuning) and another for final testing. TableI lists the number of binaries in the training, validation, and test sets.
Prior work [1, 55, 80] used larger malware datasets for training (in some cases containing two orders of magnitude more samples than the Microsoft dataset). Unfortunately, however, the raw binaries from prior work’s datasets are proprietary. Consequently, we resorted to using a publicly available dataset. Nonetheless, the DNNs that we trained achieve comparable performance to those of prior work.
To collect benign binaries, we installed standard packages on a newly created 32-bit Windows 7 virtual machine and gathered the binaries pertaining to these packages. Specifically, we used the Ninite and Chocolatey222https://ninite.com/ and https://chocolatey.org/ package managers to install 179 packages. The packages that we installed included popular ones that are commonly used by a variety of users (such as Chrome, Firefox, WinRAR, Spotify, …), as well as packages that are likely to be used by specific user groups, such as developers (e.g., PyCharm), academics (e.g., MiKTeX), and graphics designers (e.g., Gimp). This resulted in 19,534 binaries that we partitioned into training, validation, and test sets of comparable sizes to those for malware (see Table I). When partitioning, we placed binaries from the same packages in the same partitions to ensure that the DNNs learned to tell apart malicious and benign binaries rather to than to associate binaries of the same packages with each other.
Using the malicious and benign samples, we trained two malware-detection DNNs. Both DNNs receive binaries’ raw bytes as inputs and output the probability that the binaries are malicious. The first DNN, proposed by Krčál et al. , receives inputs up to 512 KB in size. We refer to it by , in reference to the authors’ affiliation. The second DNN, proposed by Raff et al. , receives inputs up to 2 MB in size. We refer to this DNN by , as per the authors’ naming. Except for the batch-size parameter, we used the same training parameters reported in the papers. We set the batch size to 32 due to memory limitations. In addition, when using benign binaries for training, we excluded the headers. This is both to remain consistent with the malicious binaries (which do not include headers), but also to ensure that the DNNs would not rely on header values that are easily manipulable for classification . As the results below demonstrate, excluding the header leads to DNNs that are more difficult to evade.
The classification performance of the DNNs is reported in Table II. Both DNNs achieve test accuracy of about 99%. Even when restricting the false positive rates (FPRs) conservatively to 0.1% (as is often done by anti-virus vendors ), the true positive rates (TPRs) remain as high as 80–89% (i.e., 80–89% of malicious binaries are detected). The performance results that we computed are superior to the ones reported in the original papers both for classification from raw bytes and from manually crafted features [55, 80]. We believe the reason to be that our dataset was restricted to nine malware families, and expect the performance to slightly decrease when incorporating additional malware families.
In addition to the two DNNs that we trained, we evaluated the attacks using a publicly available DNN that was trained by Anderson and Roth . We refer to this DNN by , in reference to the authors’ affiliation. has a similar architecture to . The salient differences are that: 1) ’s input dimensionality is 1 MB (compared to 2 MB for ); and 2) uses the header for classification. On a dataset separate from ours that was curated by a computer-security company, achieved about 92% TPR when the FPR was restricted to 0.1% .
To evaluate attacks against the DNNs, we selected binaries according to three criteria. First, the binaries had to be unpacked. To this end, we used standard packer detectors (specifically, Packerid  and Yara ) and deemed binaries as unpacked only if none of the detectors exhibited a positive detection. This method is similar to the one followed by Biondi et al. .333Biondi et al. used three packer-detection tools instead of two. Unfortunately, we were unable to get access to one of the proprietary tools. While the data used to train and evaluate the performance of the DNNs included packed binaries (we could not exclude potentially packed binaries from the Microsoft dataset due to missing headers), the high accuracy of the DNNs on the test samples suggests that the DNNs’ performance was not impacted by (lack of) packing. Second, the binaries had to be classified correctly and with high confidence by the DNNs that we trained. In particular, malicious (resp., benign) binaries had to be classified as malicious (resp., benign), and the estimated probability that they are malicious had to be above (resp., below) the threshold where the FPR (resp., false negative rate, FNR) is 0.1%. Consequently, our evaluation of the attacks’ success is conservative: the attacks would be more successful for binaries that are initially classified correctly, but not with high confidence. Third, the binaries’ sizes had to be smaller than the DNNs’ input dimensionality. While the DNNs can classify binaries whose size is larger than the input dimensionality (as can be seen from the high classification accuracy on the validation and test sets), we avoided large binaries as a means to prevent evasion by displacing malicious code outside the input range of the DNNs.
Using these criteria, we selected 99 benign binaries from the test set to evaluate the attacks against each of the three DNNs. Leading malware detection to misclassify these benign samples can harm users’ trust in the defense . Unfortunately, we were unable to use the malicious binaries from the Microsoft dataset to evaluate our attacks, as they lack the headers, and so they cannot be disassembled as necessary for the and transformations. To this end, we used VirusShare —an online repository of malware samples—to collect malicious binaries belonging to the nine families that are present in the Microsoft dataset (as indicated by the labels that commercial anti-viruses assigned the samples). Following this approach, we collected a variable number of binaries that were unseen in training to test the attacks against each one of the DNNs, as specified in Table III. The total number of samples the we collected to evaluate the attacks is comparable to that used in prior work on evading malware detection [50, 56, 96, 97].
Iv-B Randomly Applied Transformations
We first evaluated whether naïvely transforming binaries at random would lead to evading the DNNs. To do so, for each binary that we used to evaluate the attacks we created 200 variants using the and transformations and classified them using the DNNs. If any of the variants was misclassified by a DNN, we would consider the evasion attempt successful. We set to increase binaries’ sizes by 5% (i.e., the displacement budget was set to 5% of the binary’s original size). We selected 200 and 5% as parameters for this experiment because our attacks were executed for 200 iterations at most, and achieved almost perfect success when increasing binaries’ sizes by 5% (see below).
Except for a single benign binary that was misclassified by after being transformed, no other misclassification occurred. Hence, we can conclude that the DNNs are robust against naïve transformations, and that more principled approaches are needed to mislead them.
Iv-C White-Box Attacks vs. DNNs
In the white-box setting, we evaluated seven variants of our attack. One variant, to which refer by , relies on the transformations. Three variants, -1, -3, and -5, rely on the transformations, where the numbers indicate the displacement budget as a percentage of the binaries’ sizes (e.g., -1 increases binaries’ sizes by 1%). The last three attack variants, +-1, +-3, and +-5, use the and transformations combined. We executed the attacks up to 200 iterations and stopped early if the binaries were misclassified with high confidence. For malicious (resp., benign) binaries, this meant that they were misclassified as benign (resp., malicious) with an estimated probability that they are malicious below the probability where the FPR (resp., FNR) is 0.1%. We set 5% as the maximum displacement budget and 200 as the maximum number of iterations, as we empirically found that the attacks were almost always successful with these parameters.
In addition to our attacks, we implemented and evaluated an attack proposed by Kreuk et al. . To mislead DNNs, the attack of Kreuk et al. appends adversarially crafted bytes to binaries. These bytes are crafted via an iterative algorithm that first computes the gradient of the loss with respect to the embedding of the binary at the th iteration, and then sets the adversarial bytes to minimize the distance of the new embedding from +, where is a scaling parameter. We tested three variants of the attack, denoted by -1, -3, and -5, which increase the binaries’ sizes by 1%, 3%, and 5%, respectively. As a loss function, we used . Similarly to our attacks, we executed Kreuk et al.’s attacks up to 200 iterations, stopping sooner if misclassification with high confidence occurred. Furthermore, we set =1, as we empirically found that it leads to high evasion success.
We measured the success rates of attacks by the percentage of binaries that were misclassified. The results of the experiment are provided in Fig. 4. One can immediately see that attacks using the transformations were more successful than . In fact, was able to mislead only , achieving 62% success rate at misclassifying malicious binaries as benign, and 74% success rate at misclassifying benign binaries as malicious. was unable to mislead and in any attempt. This indicates that using binaries’ headers for classification, similarly to , may lead to more vulnerable models than when excluding the header.
In contrast to , other variants of our attack achieved considerable success. For example, -5 achieved high-confidence misclassification in all attempts, except when attempting to mislead to misclassify benign binaries, where 81% of the attempts succeeded. As one would expect, attacks with higher displacement budget were more successful. Specifically, attacks with 5% displacement budget were more successful than ones with 3%, and the latter were more successful than attacks with 1% displacement budget.
In addition to achieving higher success rates, another advantage of -based attacks over -based ones is their time efficiency. While displacing instruction at random from within a function with instructions has time complexity, certain transformations have time complexity. For example, reordering instructions requires building a dependence graph and extracting instructions one after the other. If every instruction in a function depends on previous ones, this process takes time. In practice, we found that while -based attacks took about 159 seconds to run on average, -based ones took 606 seconds.
While had limited success, combining with achieved higher success rates than respective -only attacks with the same displacement budget. For example, +-1 had 6% higher success rate than -1 when misleading to misclassify a malicious binary as benign (97% vs. 91% success rate). Thus, in certain situations, and can be combined to fool the DNNs while increasing binaries’ sizes less than alone.
The variants of Kreuk et al.’s attack achieved success rates comparable to variants of our attack. For example, -5 was almost always able to mislead the DNNs—it achieved 99% and 98% success rate when attempting to mislead and , respectively, to misclassify malicious binaries, and 100% success rate in all the other attempts. One can also see that the success rate increased as the attacks increased the binaries’ sizes. In particular, -5 was more successful at misleading the DNNs than -3, which, in turn, was more successful than -1.
While Kreuk et al.’s attack achieved success rates that are comparable to ours, it is important to highlight that their attack is easier to defend against. As a proof of concept, we implemented a sanitization method to defend against the attack. The method finds all the sections that do not contain instructions (using the IDAPro disassembler ) and masks the sections’ content with zeros. As Kreuk et al.’s attack does not introduce code to the binaries, the defense masks the adversarial bytes that it introduces. Consequently, the evasion success rates of the attack drop significantly. For example, the success rates of -5 against drop to 1% and 11% for malicious and benign binaries, respectively. At the same time, the defense has little-to-no effect on our attacks. For example, -5 achieves 91% and 100% success rates for malicious and benign binaries, respectively. Moreover, the classification accuracy remains high for malicious (99%) and benign (93%) binaries after the defense.
Iv-D Black-Box Attacks vs. DNNs
As explained in Sec. III, because the DNNs’ input is discrete, estimating gradient information to mislead them in a black-box setting is not possible. To this end, the black-box version of Alg. 1 uses a hill-climbing approach to query the DNN after each attempted transformation to decide whether to keep the transformation. Because querying the DNNs after each attempted transformation leads to a significant increase in run time of the attacks (30 on a machine with GeForce GTX 980 GPU), we limited our experiments to transformations with a displacement budget of 5%, and attempted to mislead the DNNs to misclassify malicious binaries. We executed the attacks up to 200 iterations and stopped early if misclassification occurred.
The attacks were most successful against , achieving a success rate of 97%. In contrast, 33% of the evasion attempts against succeeded, and none of attempts against were successful. In cases of failure, we observed that the optimization process was getting stuck in local minimas. We believe that it may be possible to enhance the performance of the attacks in such cases by deploying methods for overcoming local minimas (e.g., the Metropolis algorithm , or Monte-Carlo tree search ).
Motivated by the universality property of adversarial examples  and the finding of Kreuk et al. that adversarial bytes generated for one binary may lead to evasion when appended to another binary , we wanted to see if the transformations applied to one binary can lead to evasion when applied to another. If so, attackers in a black-box setting may invest considerable effort to transform one binary to mislead malware-detection and then apply the same transformations to other binaries. We focused on transformations and tested whether the semantic nops that were added to certain binaries via the -5 attack in the white-box setting lead to evasion when added to other binaries. To do so, we developed a modified version of that displaces code within binaries at random, but instead of drawing the semantic nops randomly, it borrows them from another previously transformed binary.
We tested this approach with malicious binaries and found that it leads to relatively high success rates. For and , certain transformations led to evasion success rates as high as 75% and 86%, respectively, when borrowed from one binary and applied to other binaries (i.e., merely 14%–25% lower success rates than for white-box attacks). The success rates were more limited against , achieving a maximum of 24%.
Iv-E Transferability of Attacks to Commercial Anti-Viruses
To assess whether our attacks affect commercial anti-viruses, we tested how the binaries that were misclassified by the DNNs with high confidence in the white-box setting get classified by anti-viruses available via VirusTotal —an online service that aggregates the results of 68 commercial anti-viruses. Since anti-viruses often rely on ML for malware detection, and since prior work has shown that adversarial examples that evade one ML model often evade other models (a phenomenon called transferability) [33, 74], we expected that the malicious (resp., benign) binaries generated by our attacks would be classified as malicious by fewer (resp., more) anti-viruses than the original binaries.
As a baseline, we first classified the original binaries using the VirusTotal anti-viruses. As one would expect, all the malicious binaries were detected by several anti-viruses. The median number of anti-viruses that detected any particular malware binary as malicious was 55, out of 68 total anti-viruses. In contrast, the original benign binaries were detected by a median of 0 anti-viruses, with a total of five false positives across all binaries and anti-viruses. To further gauge the accuracy of the commercial anti-viruses, we used them to classify binaries that were transformed at random using the and transformation types (in the same manner as Sec. IV-B). We found that certain anti-viruses were susceptible to such simple evasion attempts—the median number of anti-viruses that detected the malicious binaries correctly decreased to 42. At the same time, the median number of anti-viruses that detected benign binaries as malicious remained 0. Presumably, some anti-viruses were evaded by random transformations due to using fragile detection mechanisms, such as signatures.
summarizes the effect of our attacks on the number of positive detections (i.e., classification of binaries as malicious) by the anti-viruses. Compared to the original malicious binaries and ones that were transformed at random, the malicious binaries transformed by our attacks were detected as malicious by fewer anti-viruses. The median number of anti-viruses that correctly detected the malicious binaries decreased from 55 for the original binaries and 42 for ones transformed at random to 33–36, depending on the attack variant and the targeted DNN. According to a Kruskal-Wallis test, this reduction is statistically significant (0.01 after Bonferroni correction). In other words, the malicious binaries that were transformed by our attacks were detected by only 49%–53% of the VirusTotal anti-viruses in the median case.
The number of positive detections of benign binaries increased after they were transformed by our attacks: The median number of anti-viruses that detected the benign binaries as malicious was one or two, depending on the attack variant and the targeted DNN. In certain cases, the number of positive detections was as high as 19 (i.e., 28% of the VirusTotal anti-viruses reported the binary as malicious). Except for one attack ( targeting ), the increase in the number of detections is statistically significant (0.01 after Bonferroni correction), according to a Kruskal-Wallis test.
Our attacks evaded a larger number of anti-viruses compared to random transformations, likely due to transferring from our DNNs to ML detectors that are used by the anti-viruses. A glance at the websites of the anti-viruses’ vendors showed that 15 of the 68 vendors explicitly advertise relying on ML for malware detection. These anti-viruses were especially susceptible to evasion by our attacks. Of particular note, one vendor advertises that it relies solely on ML for malware detection. This vendor’s anti-virus misclassified 78% of the benign binaries that were produced by one variant of our attack as malicious. In general, a median of 1–2 anti-viruses (from the 15 vendors) misclassified benign binaries that were processed by our attacks as malicious. Even more concerning, a popular anti-virus whose vendor reports to rely on ML misclassified 85% of the malicious binaries produced by a variant of our attack as benign. Generally, malicious binaries that were produced by our attacks were detected by a median number of 7–9 anti-viruses of the 15—down from 12 positive detections for the original binaries. All in all, while online advertising (or lack thereof) is a weak indicator of the nature of detectors used by anti-viruses (e.g., some prominent vendors do not explicitly advertise the use of ML), our results support that binaries that were produced by our attacks were able to evade ML-based detectors that are used by anti-virus vendors.
A key feature of our attacks is that they transform binaries to mislead DNNs while preserving their functionality. We followed standard practices from the binary-diversification literature [53, 54, 77] to ensure that the functionality of the binaries was kept intact after being processed by our attacks. First, we transformed ten different benign binaries (e.g., python.exe of Python version 2.7, and Cygwin’s444https://www.cygwin.com/ less.exe and grep.exe) with our attacks and manually validated that they functioned properly after being transformed. For example, we were still able to search files with grep after the transformations. Second, we transformed the .exe and .dll files of a stress-testing tool555https://www.passmark.com/products/performancetest/ with our attacks and checked that the tool’s tests passed after the transformations. Using stress-testing tools to evaluate the correctness of binary-transformation methods is common, as such tools are expected to cover most branches affected by the transformations. Third, and last, we also transformed ten malware binaries and used the Cuckoo Sandbox —a popular sandbox for malware analysis—to check that their behavior remained the same. All ten binaries attempted to access the same hosts, IP addresses, files, APIs, and registry keys before and after being transformed.
Our proposed attacks achieved high success rates at fooling DNNs for malware detection in white-box and black-box settings. The attacks were also able to mislead commercial anti-viruses, especially ones that leverage ML algorithms. To protect users and their systems, it is important to develop mitigation measures to make malware detection robust against evasion by our attacks. Moreover, it is important to consider ways to extend our attacks to other settings to understand the weaknesses of other systems and help improve their security. Next, we discuss potential mitigations and extensions to our attacks.
V-a Potential Mitigations
) is infeasible, as the attacks are computationally expensive. Depending on the attack variant, it took an average of 159 or 606 seconds to run an attack. As a result, running just a single epoch of adversarial training would to take several weeks (using our hardware configuration), as each iteration of training requires running an attack for every sample in the training batch. Moreover, while adversarial training might increase the DNNs’ robustness against attackers using certain transformation types, attackers using new transformation types may still succeed at evasion. Defenses that provide formal guarantees (e.g., [52, 69]) are even more computationally expensive than adversarial training. Moreover, those defenses are restricted to adversarial perturbations that, unlike the ones produced by our attacks, have small - and -norms. Prior defenses that transform the input before classification (e.g., via quantization ) are designed mainly for images and do not directly apply to binaries. Lastly, signature-based malware detection would not be effective, as our attacks are stochastic and produce different variants of the binaries after different executions.
Differently from prior attacks on DNNs for malware detection [50, 56, 97], our attacks do not merely append adversarially crafted bytes to binaries, or insert them between sections. Such attacks may be defended against by detecting and sanitizing the inserted bytes via static analysis methods (e.g., similarly to the proof of concept shown in Sec. IV-C, or using other methods ). Instead, our attacks transform binaries’ original code, and extend binaries only by inserting instructions that are executed at run time at various parts of the binaries. As a result, our attacks are difficult to defend against via static or dynamic analyses methods (e.g., by detecting and removing unreachable code), especially when augmented by measures to evade these methods.
Binary normalization [3, 18, 103] is an approach that was proposed to enhance malware detection that seemed viable for defending against our attacks. The high-level idea of normalization is to employ certain transformations to map binaries to a standard form and thus undo attackers’ evasion attempts before classifying the binaries as malicious or benign. For example, Christodorescu et al. proposed a method to detect and remove semantic nops from binaries before classification, and showed that it improves the performance of commercial anti-viruses . To mitigate our -based attacks, we considered using the semantic nop detection and removal method followed by a method to restore the displaced code to its original location. Unfortunately, we realized that such a defense can be undermined using opaque predicates [22, 72]. Opaque predicates are predicates whose value (w.l.g., assume true) is known a priori to the attacker, but is hard for the defender to deduce. Often, they are based on -hard problems . Using opaque predicates, attackers can produce semantic nops that include instructions that affect the memory and registers only if an opaque predicate evaluates to false. Since opaque predicates are hard for defenders to deduce, the defenders are likely to have to assume that the semantic nops impact the behavior of the program. As a result, the semantic nops would survive the defenders’ detection and removal attempts. As an alternative to opaque predicates, attackers can also use evasive predicates—predicates that evaluate to true or false with an overwhelming probability (e.g., checking if a randomly drawn 32-bit integer is equal to 0) . In this case, the binary will function properly the majority of the time, and may function differently or crash once every many executions.
The normalization methods proposed by prior work would not apply to the transformations performed by our -based attacks. Therefore, we explored methods to normalize binaries to a standard form to undo the effects of before classification. We found that a normalization process that leverages the transformations to map binaries to the form with the lowest lexicographic representation (where the alphabet contains all possible 256 byte values) is effective at preventing -based attacks. Formally, if is the equivalence class of binaries that are functionally equivalent to and that can be produced via the transformation types, then the normalization process produces an output , such that, for every . Alg. 2 in App. -A presents an algorithm that computes the normalized form of a binary when executed for a large number of iterations, and approximates it when executed for a few iterations. At a high level, the algorithm applies the transformations iteratively in a way that attempts to reduce the lexicographic representation after every iteration. We found that executing the algorithm for ten iterations is sufficient to defend against -based attacks. In particular, we executed the normalization algorithm using the malicious and benign binaries produced by the -based attacks to fool in the white-box setting, and found that the success rates dropped to 3% and 0%, respectively, compared to 62% and 74% before normalization. At the same time, the classification accuracy over the original binaries was not affected by normalization. As our experiments in Sec. IV have shown, generating functionally equivalent variants of binaries via random transformations results in correct classifications almost all of the time. Normalization of binaries to the minimal lexicographic representation deterministically leads to the specific functionally equivalent variants that get correctly classified with high likelihood.
Instruction Masking While normalization was useful for defending against -based attacks, it cannot mitigate the more pernicious
-based attacks that are augmented with opaque or evasive predicates. Moreover, normalization has the general limitations that attackers could use transformations that the normalization algorithm is not aware of or could obfuscate code to inhibit normalization. Therefore, we explored additional defensive measures. In particular, motivated by the fact that randomizing binaries without the guidance of an optimization process is unlikely to lead to misclassification, we explored whether masking instructions at random can mitigate attacks while maintaining high performance on the original binaries. The defense works by selecting a random subset of the bytes that pertain to instructions and masking them with zeros (a commonly used value to pad sections in binaries). While the masking is likely to result in an ill-formed binary that is unlikely to execute properly (if at all), the masking only occurs before classification, which does not require a functional binary. Depending on the classification result, one can decide whether or not to execute the unmasked binary.
We tested the defense on binaries generated via the +-5 white-box attack and found that it is effective at mitigating attacks. For example, when masking 25% of the bytes pertaining to instructions, the success rates of the attack decreases from 83%–100% for malicious and benign binaries against the three DNNs to 0%–20%, while the accuracy on the original samples was only slightly affected (e.g., it became 94% for ). Masking less than 25% of the instructions’ bytes was not as effective at mitigating attacks, while masking more than 25% led to a significant decrease in accuracy on the original samples.
Detecting Adversarial Examples To prevent binaries transformed with our attacks (i.e., adversarial examples) from fooling malware detection, defenders may attempt to deploy methods to detect them. In cases of positive detections of adversarial examples, defenders may immediately classify them as malicious (regardless of whether they were originally malicious or benign). For example, because -based attacks increase binaries’ sizes and introduce additional jmp instructions, defenders may train statistical ML models that use features such as binaries’ sizes and the ratio between jmp instructions and other instructions to detect adversarial examples. While training relatively accurate detection models may be feasible, we expect this task to be difficult, as the attacks increase binaries’ sizes only slightly (1%–5%), and do not introduce many jmp instructions (7% median increase for binaries transformed via -5). Furthermore, approaches for detecting adversarial examples are likely to be susceptible to evasion attacks (e.g., by introducing instructions after opaque predicates to decrease the ratio between jmp instructions and others). Last, another risk that defenders should take into account is that the defense should be able to precisely distinguish between adversarial examples and non-adversarial benign binaries that are transformed by similar methods to mitigate code-reuse attacks [54, 77].
Takeaways While masking a subset of the bytes that pertain to instructions led to better performance on adversarial examples, it was still unable to prevent all evasion attempts. Although the defense may raise the bar to attackers, and make attacks even more difficult if combined with a method to detect adversarial examples, these defenses do not provide formal guarantees and so attackers may be able to adapt to undermine them. For example, attackers may build on techniques for optimization over expectations to generate binaries that would mislead the DNNs even when masking a large number of instructions, in a similar manner to how attackers can evade image-classification DNNs under varying lighting conditions and camera angles [7, 30, 91, 92]. In fact, prior work has already demonstrated how defenses without formal guarantees are often vulnerable to adaptive, more sophisticated, attacks . Thus, since there is no clear defense to prevent attacks against the DNNs that we studied in this work, or even general methods to prevent attackers from fooling ML models via arbitrary perturbations, we advocate for augmenting malware-detection systems with methods that are not based on ML (e.g., ones using templates to reason about the semantics of programs ), and against the use of ML-only detection methods, as has become recently popular .
V-B Potential Extensions
Our work focuses on attacks targeting DNNs for malware detection from raw bytes. Nevertheless, we believe that it can be extended to help study and improve the robustness of other malware-detection methods. For example, prior work studied the use of n-gram features for malware classification[51, 81]. By transforming binaries, our attacks can potentially change the n-gram statistics to evade malware detection.
Another potential extension to our work is to study and improve the robustness of clone-search methods (e.g., ) that are often used in reverse engineering for studying new malware, detecting patent infringements, or finding vulnerabilities in software. Ding et al. recently suggested the use of neural networks to map assembly code to vector representations that are similar for clones and different for non-clones . Building on our attacks, we believe that attackers could manipulate the representations generated by such neural networks to make the representations of clones different (e.g., to make it difficult to study new malware), or make the representations of non-clones similar (e.g., to support a fake patent infringement case).
Our work proposes evasion attacks on DNNs for malware detection. Differently from prior work, the attacks do not merely insert adversarially crafted bytes to mislead detection. Instead, guided by optimization processes, our attacks transform the instructions of binaries to fool malware detection while keeping functionality of the binaries intact. As a result, these attacks are challenging to defend against. We conservatively evaluated different variants of our attack against three DNNs under white-box and black-box settings, and found the attacks successful as often as 100% of the time. Moreover, we found that the attacks pose a security risk to commercial anti-viruses, particularly ones using ML, achieving evasion success rates of up to 85%. We explored several potential defenses, and found some to be promising. Nevertheless, adaptive adversaries remain a risk, and we recommend the deployment of multiple detection algorithms, including ones not based on ML, to raise the bar against such adversaries.
We would like to thank Sandeep Bhatkar, Leyla Bilge, Yufei Han, Kevin Roundy, and Hugh Thompson for helpful discussions. This work was supported in part by the Multidisciplinary University Research Initiative (MURI) Cyber Deception grant; by NSF grants 1801391 and 1801494; by the National Security Agency under Award No. H9823018D0008; by gifts from Google and Nvidia, and from Lockheed Martin and NATO through Carnegie Mellon CyLab; and by a CyLab Presidential Fellowship and a NortonLifeLock Research Group Fellowship.
-  (2018) Ember: An open dataset for training static PE malware machine learning models. arXiv preprint arXiv:1804.04637. Cited by: §IV-A, §IV-A.
Learning to evade static PE machine learning malware models via reinforcement learning. arXiv preprint arXiv:1801.08917. Cited by: §II-B, §II-B.
-  (2012) A general paradigm for normalizing metamorphic malwares. In Proc. FIT, Cited by: §I, §V-A.
-  (2014) DREBIN: effective and explainable detection of android malware in your pocket.. In Proc. NDSS, Cited by: §I, §I, §II-A.
-  (2018) Obfuscated gradients give a false sense of security: Circumventing defenses to adversarial examples. In Proc. ICML, Cited by: §II-B, §II-B, §V-A.
-  (2018) On the robustness of the CVPR 2018 white-box adversarial example defenses. arXiv preprint arXiv:1804.03286. Cited by: §II-B.
-  (2018) Synthesizing robust adversarial examples. In Proc. ICML, Cited by: §V-A.
Adversarial transformation networks: learning to generate adversarial examples. In Proc. AAAI, Cited by: §II-B.
-  (2014) Obfuscation for evasive functions. In Proc. TCC, Cited by: §V-A.
-  (2013) Evasion attacks against machine learning at test time. In Proc. ECML PKDD, Cited by: §II-B.
-  (2018) Effective, efficient, and robust packing detection and classification. Computers and Security. Cited by: §I, §II-A, §II-B, §III-A, §IV-A.
-  (2017) Adversarial examples are not easily detected: bypassing ten detection methods. In Proc. AISec, Cited by: §II-B.
-  (2017) Towards evaluating the robustness of neural networks. In Proc. IEEE S&P, Cited by: §II-B.
-  (2018) ShapeShifter: robust physical adversarial attack on faster r-cnn object detector. In Proc. ECML PKDD, Cited by: §II-B.
-  (2018) Towards paving the way for large-scale windows malware analysis: generic binary unpacking with orders-of-magnitude performance boost. In Proc. CCS, Cited by: §III-A.
-  (2005) Semantics-aware malware detection. In Proc. IEEE S&P, Cited by: §II-B, §III-B, §V-A.
-  (2004) Testing malware detectors. In Proc. ISSTA, Cited by: §II-B.
-  (2005) Malware normalization. Technical report University of Wisconsin-Madison. Cited by: §I, §III-B, §V-A.
-  (2004–) VirusTotal. Note: https://www.virustotal.com/Online; accessed 17 June 2019 Cited by: §IV-E.
-  (2017) Houdini: Fooling deep structured prediction models. In Proc. NIPS, Cited by: §I, §II-B.
-  (2019) Certified adversarial robustness via randomized smoothing. arXiv preprint arXiv:1902.02918. Cited by: §II-B.
-  (1997) A taxonomy of obfuscating transformations. Technical report The University of Auckland. Cited by: §V-A.
-  (2018) What are deep neural networks learning about malware?. Note: https://www.fireeye.com/blog/threat-research/2018/12/what-are-deep-neural-networks-learning-about-malware.htmlOnline; accessed 1 July 2019 Cited by: §II-A.
Cylance: artificial intelligence based advanced threat prevention. Note: https://www.cylance.com/en-us/index.htmlAccessed on 06-17-2019 Cited by: §I, §V-A.
-  (2017) Evading classifiers by morphing in the dark. In Proc. CCS, Cited by: §II-B.
Explaining vulnerabilities of deep learning to adversarial malware binaries. arXiv preprint arXiv:1901.03583. Cited by: §I, §II-B, §II-B, §IV-A.
-  (2017) Yes, machine learning can be more secure! A case study on android malware detection. IEEE Transactions on Dependable and Secure Computing. Cited by: §II-B.
-  (2019) Asm2Vec: Boosting static representation robustness for binary clone search against code obfuscation and compiler optimization. In Proc. IEEE S&P, Cited by: §V-B.
-  (2017) A rotation and a translation suffice: fooling CNNs with simple transformations. In Proc. NeurIPSW, Cited by: §II-B, §V-A.
-  (2018) Robust physical-world attacks on machine learning models. In Proc. CVPR, Cited by: §I, §I, §II-B, §V-A.
-  (2017) Detecting adversarial samples from artifacts. arXiv preprint arXiv:1703.00410. Cited by: §II-B.
-  (2018) Non-negative networks against adversarial attacks. arXiv preprint arXiv:1806.06108. Cited by: §II-B.
-  (2015) Explaining and harnessing adversarial examples. In Proc. ICLR, Cited by: §II-B, §II-B, §IV-E, §V-A.
-  (2017) On the (statistical) detection of adversarial examples. arXiv preprint arXiv:1702.06280. Cited by: §II-B.
-  (2017) Adversarial examples for malware detection. In Proc. ESORICS, Cited by: §II-B.
-  (2012) The Cuckoo Sandbox. Note: https://cuckoosandbox.org/Online; accessed 21 June 2019 Cited by: §IV-F.
-  (2018) Countering adversarial images using input transformations. Cited by: §II-B.
-  (2005) Practical analysis of stripped binary code. ACM SIGARCH Computer Architecture News 33 (5), pp. 63–68. Cited by: §II-C.
-  (2009) So long, and no thanks for the externalities: the rational rejection of security advice by users. In Proc. NSPW, Cited by: §III-A, §IV-A.
-  IDA: About. Note: https://www.hex-rays.com/products/ida/Online; accessed 13 September 2019 Cited by: §IV-C.
-  (2017) Generating adversarial malware examples for black-box attacks based on GAN. arXiv preprint arXiv:1702.05983. Cited by: §II-B.
-  (2016) MtNet: A multi-task neural network for dynamic malware classification. In Proc. DIMVA, Cited by: §I.
-  (2019) Prior convictions: black-box adversarial attacks with bandits and priors. In Proc. ICLR, Cited by: §III-B.
-  (2018) Adversarially robust malware detection using monotonic classification. In Proc. IWSPA, Cited by: §II-A, §II-B.
-  (2015) Obfuscator-llvm–software protection for the masses. In Proc. IWSP, Cited by: §II-B.
-  (2018) Adversarial logit pairing. arXiv preprint arXiv:1803.06373. Cited by: §II-B.
-  (2016) Evasion and hardening of tree ensemble classifiers. In Proc. ICML, Cited by: §II-B.
-  (2019) PE format. Note: https://docs.microsoft.com/en-us/windows/desktop/debug/pe-formatAccessed on 06-03-2019 Cited by: §IV-A.
-  (2009) Effective and efficient malware detection at the end host. In Proc. USENIX Security, Cited by: §I.
-  (2018) Adversarial malware binaries: evading deep learning for malware detection in executables. In Proc. EUSIPCO, Cited by: §I, §II-B, §II-B, §III-B, §IV-A, §V-A.
-  (2006) Learning to detect and classify malicious executables in the wild. Journal of Machine Learning Research 7 (Dec), pp. 2721–2744. Cited by: §II-A, §IV-A, §V-B.
-  (2018) Provable defenses against adversarial examples via the convex outer adversarial polytope. In Proc. ICML, Cited by: §II-B, §V-A.
-  (2018) Compiler-assisted code randomization. In Proc. IEEE S&P, Cited by: §II-C, §IV-F.
-  (2016) Juggling the gadgets: binary-level code randomization using instruction displacement. In Proc. AsiaCCS, Cited by: §I, §II-C, §III-B, §III-B, §IV-F, §V-A.
-  (2018) Deep convolutional malware classifiers can learn from raw executables and labels only. In Proc. ICLRW, Cited by: §I, §I, §II-A, §II-A, §II-B, §III-A, §IV-A, §IV-A, §IV-A.
-  (2018) Adversarial examples on discrete sequences for beating whole-binary malware detection. In Proc. NeurIPSW, Cited by: §I, §II-B, §II-B, §II-B, §III-B, §IV-A, §IV-A, §IV-C, §IV-D, §V-A.
-  (2004) Static disassembly of obfuscated binaries. In Proc. USENIX Security, Cited by: §I, §V-A.
-  (2017) Adversarial machine learning at scale. In Proc. ICLR, Cited by: §II-B, §V-A.
-  (2014) SoK: Automated software diversity. In Proc. IEEE S&P, Cited by: §II-C.
-  (2014) Distributed representations of sentences and documents. In Proc. ICML, Cited by: §III-A.
-  (2019) Certified robustness to adversarial examples with differential privacy. In Proc. IEEE S&P, Cited by: §II-B.
-  (2018) Defense against adversarial attacks using high-level representation guided denoiser. In Proc. CVPR, Cited by: §II-B.
-  (2005) Adversarial learning. In Proc. KDD, Cited by: §II-B.
-  (2018) Towards deep learning models resistant to adversarial attacks. In Proc. ICLR, Cited by: §II-B.
-  (1987) Superoptimizer: A look at the smallest program. ACM SIGARCH Computer Architecture News 15 (5), pp. 122–126. Cited by: §II-C.
-  (2017) MagNet: A two-pronged defense against adversarial examples. In Proc. CCS, Cited by: §II-B.
-  (2018) Adversarial binaries for authorship identification. arXiv preprint arXiv:1809.08316. Cited by: §II-B.
-  (2017) On detecting adversarial perturbations. In Proc. ICLR, Cited by: §II-B.
-  (2018) Differentiable abstract interpretation for provably robust neural networks. In Proc. ICML, Cited by: §II-B, §V-A.
-  (2017) Universal adversarial perturbations. In Proc. CVPR, Cited by: §IV-D.
-  (2016) DeepFool: a simple and accurate method to fool deep neural networks. In Proc. CVPR, Cited by: §II-B.
-  (2007) Limits of static analysis for malware detection. In Proc. ACSAC, Cited by: §II-C, §V-A.
-  (2018) How to escape local optima in black box optimisation: When non-elitism outperforms elitism. Algorithmica 80 (5), pp. 1604–1633. Cited by: §IV-D.
-  (2017) Practical black-box attacks against machine learning. In Proc. AsiaCCS, Cited by: §IV-E.
-  (2016) The limitations of deep learning in adversarial settings. In Proc. IEEE Euro S&P, Cited by: §II-B.
Crafting adversarial input sequences for recurrent neural networks. In Proc. MILCOM, Cited by: §II-B.
-  (2012) Smashing the gadgets: hindering return-oriented programming using in-place code randomization. In Proc. IEEE S&P, Cited by: §I, §II-C, §III-B, §III-B, §IV-F, §V-A.
Imperceptible, robust, and targeted adversarial examples for automatic speech recognition. In Proc. ICML, Cited by: §I, §II-B.
-  (2019) Misleading authorship attribution of source code using adversarial learning. In Proc. USENIX Security, Cited by: §II-B.
-  (2018) Malware detection by eating a whole exe. In Proc. AAAIW, Cited by: §I, §II-A, §II-A, §IV-A, §IV-A, §IV-A.
-  (2018) An investigation of byte n-gram features for malware classification. Journal of Computer Virology and Hacking Techniques 14 (1), pp. 1–20. Cited by: §V-B.
-  (2012) VirusShare. Note: https://virusshare.com/Online; accessed 18 June 2019 Cited by: §IV-A.
-  (2018) Microsoft malware classification challenge. arXiv preprint arXiv:1802.10135. Cited by: §II-B, §IV-A.
-  (2018) Generic black-box end-to-end attack against state of the art API call based malware classifiers. In Proc. RAID, Cited by: §II-B, §II-B.
-  (2013) Binary-code obfuscations in prevalent packer tools. ACM Computing Surveys (CSUR) 46 (1), pp. 4. Cited by: §II-B.
-  (2018) Defense-GAN: Protecting classifiers against adversarial attacks using generative models. In Proc. ICLR, Cited by: §II-B.
-  (2013) Stochastic superoptimization. In Proc. ASPLOS, Cited by: §II-C.
-  (2019) Adversarial attacks against automatic speech recognition systems via psychoacoustic hiding. In Proc. NDSS, Cited by: §I, §II-B.
-  (2014) Packer yara ruleset. Note: https://github.com/sooshie/packeridOnline; accessed 18 June 2019 Cited by: §IV-A.
-  (2018) On the suitability of lp-norms for creating and preventing adversarial examples. In Proc. CVPRW, Cited by: §II-B.
Accessorize to a crime: Real and stealthy attacks on state-of-the-art face recognition. In Proc. CCS, Cited by: §I, §I, §II-B, §V-A.
-  (2017) Adversarial generative nets: neural network attacks on state-of-the-art face recognition. arXiv preprint arXiv:1801.00349. Cited by: §I, §I, §II-B, §V-A.
A4NT: author attribute anonymity by adversarial training of neural machine translation. In Proc. USENIX Security, Cited by: §II-B.
-  (2016) Mastering the game of go with deep neural networks and tree search. Nature 529 (7587), pp. 484. Cited by: §IV-D.
Counterstrike: defending deep learning architectures against adversarial samples by langevin dynamics with supervised denoising autoencoder. arXiv preprint arXiv:1805.12017. Cited by: §II-B.
-  (2014) Practical evasion of a learning-based classifier: A case study. In Proc. IEEE S&P, Cited by: §II-B, §III-B, §IV-A.
-  (2018) Exploring adversarial examples in malware detection. In Proc. AAAIW, Cited by: §I, §II-B, §II-B, §IV-A, §V-A.
-  (2014) Intriguing properties of neural networks. In Proc. ICLR, Cited by: §II-B, §II-B.
-  (2005) The art of computer virus research and defense. Pearson Education. Cited by: §I, §I, §II-A, §II-B.
SoK: deep packer inspection: a longitudinal study of the complexity of run-time packers. In Proc. IEEE S&P, Cited by: §II-B, §III-A.
-  (2016) Packer yara ruleset. Note: https://github.com/Yara-Rules/rules/tree/master/PackersOnline; accessed 18 June 2019 Cited by: §IV-A.
-  (2002) Mimicry attacks on host-based intrusion detection systems. In Proc. CCS, Cited by: §II-B, §II-B.
-  (2006) Normalizing metamorphic malware using term rewriting. In Proc. SCAM, Cited by: §I, §V-A.
-  (2016) Uroboros: instrumenting stripped binaries with static reassembling. In Proc. SANER, Cited by: §II-C.
-  (2018) Generating adversarial examples with adversarial networks. In Proc. IJCAI, Cited by: §II-B.
-  (2018) Feature denoising for improving adversarial robustness. arXiv preprint arXiv:1812.03411. Cited by: §II-B.
-  (2018) Feature squeezing: detecting adversarial examples in deep neural networks. In Proc. NDSS, Cited by: §II-B, §V-A.
-  (2016) Automatically evading classifiers. In Proc. NDSS, Cited by: §II-B, §III-B.
-  (2019) Theoretically principled trade-off between robustness and accuracy. arXiv preprint arXiv:1901.08573. Cited by: §II-B.