MDEA: Malware Detection with Evolutionary Adversarial Learning

02/09/2020 ∙ by Xiruo Wang, et al. ∙ The University of Texas at Austin 0

Malware detection have used machine learning to detect malware in programs. These applications take in raw or processed binary data to neural network models to classify as benign or malicious files. Even though this approach has proven effective against dynamic changes, such as encrypting, obfuscating and packing techniques, it is vulnerable to specific evasion attacks where that small changes in the input data cause misclassification at test time. This paper proposes a new approach: MDEA, an Adversarial Malware Detection model uses evolutionary optimization to create attack samples to make the network robust against evasion attacks. By retraining the model with the evolved malware samples, its performance improves a significant margin.



There are no comments yet.


page 3

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

The high proliferation of and dependence on computing resources in daily life has greatly increased the potential of malware to harm consumers[1]

. It is estimated that almost one in four computers operating in the U.S. were already infected by malware in 2008

[22] and according to Kaspersky Lab, up to one billion dollars was stolen from financial institutions worldwide due to malware attacks in 2015 [17]. More recently, the notorious and widespread NotPetya ransomware attack is estimated to have caused $10 billion dollars in damages worldwide. Even worse, as reported by McAfee Labs, the diversity of malware is still evolving in expanding areas such that in Q1 2018, on average, five new malware samples were generated per second [4]. As a specific example, total coin miner malware rose by 629% in Q1 to more than 2.9 million samples in 2018 [4].

As a result of the magnitude of the threat posed by malware, a great deal of research has been done on malware identification. At the moment there are two widely used approaches: dynamic analysis, which obtains features by monitoring program executions, and static analysis, which analyzes features of binary programs without running them. Intuitively, the dynamic analysis is the first choice, since it can provide more accurate program behavior data. However, there are many issues in dynamic analysis in practice. It requires a specially constructed running environment such as a customized Virtual Machine (VM), which is computationally very costly when numerous samples are tested. Furthermore, in order to bypass this defense, some malware alter the behaviors when they are detected

[26, 9]. The analysis environment can also get false positive data from other software that are running in the same environment.

On the other hand, the static analysis methods also have disadvantages. The signature-based method such as API calls and N-grams provides the basis for most commercial antivirus products

[28, 19]. While it is widely used, its ability to combat various encryption, polymorphism and obfuscation methods used by malware attackers is limited. Machine learning based malware classification technologies have been applied to malware detection [30, 41]

. which rely heavily on relevant domain knowledge to provide applicable features. This approach cannot adapt to fast changing malware patterns nor polymorphism. In recent years, researchers have begun exploring a new frontier in data mining and machine learning known as deep learning. Deep learning techniques are now being leveraged in malware detection and classification tasks

[13, 10, 6, 38]

. Sparse Autoencocder (SAE), Convolutional Neural Network (CNN), and Recurrent Neural Network (RNN) models are all used to devise malware detection architectures. Although research thus far has provided promising results, there are still many open challenges and opportunities. First of all, due to the quick increase in the amount of novel malware, malicious techniques and patterns are changing and evolving rapidly. As a result, handling novel malware is one of the most pressing issues. In addition, in contrast to the natural language processing or computer vision tasks that are usually explored in deep learning tasks, malware byte files and assembly instructions have less understandable patterns. Therefore data prepossessing techniques are much more important. Furthermore, the adversarial attacks against neural networks, which only manipulate small portion of the input data to cause misclassification, has been proven to be a big vulnerability. Even though these types of adversarial attacks are less common on malware detection models because of the complexity and fragility of binary executables, it is possible to evade deep neural network for malware binary detection. For example, Kolosnjaji et al.

[16] trained a gradient-based model to append bytes to the overlay section of malware samples. Even though the model successfully evaded the deep neural network, both the model and the modification method are rather simple and cannot cover the complicated modifications real malware writers do such as modifying various sections based on domain knowledge.

In order to explore the data space more thoroughly, an action space is defined. It consists of 10 different modification methods of binary programs. An evolutionary optimization is used to search the best action sequence of a specific malware.

This paper proposes MDEA, an Adversarial Malware Detection model that combines a deep neural network with an evolutionary optimization at its training. MDEA consists of a convolutional neural network that classifies raw byte data from malware binaries, and an evolutionary optimization that modifies the malware that are detected. In contrast to simply appending bytes to the end of each file, an action space is defined for the evolutionary optimization to pick from and choose best action sequences for each malware sample. With the evolutionary learning, the probability that the generated input sample is classified as benign can be maximally increased. The new samples will then be fed into the detection network for retraining. The above steps represent a form of adversarial training


The experiments were performed on 7371 Windows Portable Executable (PE) malware samples and 6917 benign PE samples. The results showed that MDEA not only drastically decreased the detection model accuracy, but also increased the overall detection performance from 90% to 93% after the retraining process. This result shows that adversarial evolutionary training can improve both robustness and performance of the malware detection network.

The rest of the paper structures as follows. Section 2 presents related work. Section 3 describes an overview of MDEA and the details of each component. Section 4 describes the experimental setup and discusses the results. Section 5 provides suggestions for further research on this topic and Section 6 draws conclusions.

Ii Related Work

Malware detection and classification has been studied problem for many years. Notably, in 2015, the open Kaggle Contest: Microsoft Malware Classification Challenge (BIG 2015)[15] created a lot of interest in malware classification. The champion of this contest used machine learning with sophisticated static pattern analysis. Following that trend the goal of this paper is to leverage deep learning models without sophisticated feature engineering. This section briefly introduces related work in signature-based, learning-based [38], adversarial based [3] approaches, and evolutionary techniques.

Ii-a Signature-based Malware Detection

Signature-based and behavior-based methods are widely used in the anti-malware industry and are often used to identify “known” malware.[2] When an anti-malware solution provider identifies an object as malicious, its signature is added to a database of known malware. These repositories may contain hundreds of millions of signatures that identify malicious objects. One of the major advantages of such signature-based malware detection is that it is thorough, it follows all conceivable execution ways of a given document [35]. Because it is simple to build such a system, signature-based malware detection has been the primary identification technique used by malware products. It remains the base approach used by the latest firewalls, email and network gateways. Therefore, much research has been done in this field. Santos et al.[33] created an opcode sequence-based malware detection system. Preda et al.[24] proposed a semantics-based framework for reasoning about malware detector. Fraley and Figueroa[8] presented a unique approach leveraging topological examination using signature-based techniques. They also used data mining techniques in order to uncover and spotlight the properties of malicious files. Despite the widespread adoption of signature-based malware detection within the information security industry, malware authors can easily evade this signature-based method through techniques such as encryption, polymorphism, and obfuscation. Signature-based analysis is, therefore, poorly equipped to handle the current state of malware generation.

Ii-B Learning-based Malware Classification

Because of the weaknesses of signature-based malware detection, machine learning is a popular approach to signatureless malware detection. Many different malware detection approaches using machine learning technology have been proposed in recent years. These approaches include static analysis, which learns the statistical characteristics of malware (e.g. API calls, N-grams), and dynamic behavior analysis, which analyzes the behavior of a system against a baseline in order to determine anomalous (and possibly malicious) behavior. This paper focus on static analysis. In the Kaggle Microsoft Malware Contest [15]

, the winner used many sophisticated features for their K-nearest Neighbor(KNN) model in order to achieve high performance. Some other machine learning techniques were also studied in different works. Rehman et al.

[29]reverse-engineered the Android Apps to extract manifest files, and employed machine-learning algorithms to detect malware efficiently. They observed that SVM in case of binaries and KNN in case of manifest xml files are the most suitable options for robustly detecting malware in Android devices. Yuan et al. [40] proposed to associate features from static analysis with features from dynamic analysis of Android apps and characterize malware using deep-learning techniques.

Unlike all the above research, in order to cut down on the necessity of expert analysis, MDEA will only use basic features as input to a deep learning model. Many different deep learning models have been proposed for malware detection. Some people intend to solve this problem by Long Short-Term Memory (LSTM) model

[38], while others propose ”malware image” that are generally constructed by treating each byte of the binary as a gray-scale pixel value, and defining an arbitrary “image width” that is used for all images [34]. From all these approaches, the MalConv network[25], which use raw byte embeddings as the input, achieved the highest accuracy. Therefore, this model is used as our detection model.

Fig. 1: The overall flow of MDEA.Top: Detection Model based on the MalConv network[25], Bottom: The Evolutionary Optimization Algorithm. In this cycle, the evolutionary method learns modification patterns for each malware to evade the detection network. The newly generated samples are then fed into detection network to improve its performance. With this cycle, MDEA not only prevents the adversarial attack against detection model, but also increases the accuracy.
Fig. 2: The details of each layer in the detection model MalConv[25]

. The raw binary data is first converted to integers and then fed into an embedding layer and two convolutional layers. Then the resulting vector is fed into a fully connected layer for output.

Fig. 3: The evolutionary optimization cycle. The population initialization and evolution phase generates candidate action sequences. The binary modification phase modifies malware samples based on the action sequences. The population evaluation phase selects the best individual out of a population. The process then repeats for several generations until a good sequence is found.

Ii-C Adversarial Model for Sample Generation

Nguyen et al. [21]

inspired later research on adversarial models. They found that it was easy to produce images that were unrecognizable to humans, but deep neural networks (DNNs) could recognize them with high confidence. They trained DNN on ImageNet and MNIST datasets and produced many human-unrecognizable images. Goodfellow et al.

[11] proposed the first generative adversarial nets (GANs). GANs consist of two models. One of them is a generative model, which captures the data distribution; the other is a discriminative model that estimates the probability that a sample came from the training data rather than from the generative model. GANs have a large potential since it can learn to mimic any data distribution. Therefore, GANs have been used in many domains such as computer vision and natural language processing.

Recent work in adversarial machine learning has shown that deep learning models for machine learning are susceptible to gradient-based attacks. Anderson


proposed a more general framework based on reinforcement learning (RL) for attacking static portable executable (PE) anti-malware engines. They showed in experiments that this adversarial learning method can attack a gradient-boosted machine learning model and evade components of publicly hosted antivirus engines. Kolosnjaji et al.

[16] proposed a gradient-based attack model that is capable of evading a deep network by only changing few specific bytes at the end of each malware sample, while preserving its intrusive functionality. They were able to decrease the detection accuracy of the original detection model by more than 50% . Even though this work achieved good result, they did not use the generated attack samples to improve the detection model.

This paper will reproduce and improve upon their methods with evolutionary learning and leverage the generated adversarial samples by retraining the deep learning model to further improve the accuracy.

Ii-D Evolutionary Algorithm

Evolutionary algorithms (EAs) use mechanisms inspired by biological evolution, such as mutation, recombination and selection to select the best individual of a population to solve an optimization problem. Initially, EAs were considered as a scalable alternative to reinforcement learning[32]. In recent years, EAs performed much better than traditional optimizing method such as gradient descend in various domains [27, 23]. This advantage of EA is becoming more important in deep neural network because of the diversity and complexity it provides[39, 14]. Martní et al.[18] created an Android malware detection system with evolutionary strategies to leverage third-party calls to bypass the effects of concealment strategies. Petroski et al.[36]

evolved the weights of a DNN with genetic algorithm (GA) to perform well on hard deep RL problems, including Atari and humanoid locomotion. Chen et al.


built a model to generate groundwater spring potential map. They utilized GA to perform a feature selection procedure and data mining methods for optimizing set of variables in groundwater spring assessments.

In all the above research, EA showed an advantage against the traditional optimization methods such as stochastic gradient descent and reinforcement learning. Such gradient-free nature makes EA less vulnerable to local minimum and easier to find general solutions in high-dimensional parameter search problems.

EA is also highly robust and performs well when the number of time steps in an episode is long, where actions have long-lasting effects, and when no good value-function estimates are available[32]. Therefore EA well suited as the optimization method for MDEA.

Iii Model

This section presents the design and process of the proposed MDEA model. At first, an overview of the model structure, the dataset information and the details of the detection model are presented. The definition and details of action space, which is used by evolutionary optimization method are discussed, and the evolutionary optimization algorithm in described.

Iii-a Structure Overview

Overall, the malware detection process consists of two major parts (Figure 1). The first part involves preprocessing malware sample data and feeding these data to 1-D Convolutional network. The second part uses evolutionary optimization to evolve adversarial malware samples to evade the network. All the newly generated malware that successfully evade the detection model will be added into the training set, and the detection model is retrained. The first and the second part together form a loop as shown in Figure 1. During the training phase, this loop is iterated multiple times until the detection accuracy of the detection model converges to an acceptable level.

Since the development set is fitted multiple times during the evolutionary phase, overfitting check is performed with another small set that is never used during training.

Iii-B Dataset and Detection Model

The dataset consists of 14,288 PE file, and 7371 of them are malware samples, that were downloaded from VirusShare. The rest 6917 PE files are benign files that were gathered by crawling different websites. The deep neural network trained and attacked in this paper is the MalConv Network proposed by Raff et al.[25]. Figure 2 shows the detailed structure of this Network. MalConv takes in up to bytes data as input. Each byte is represented by a number A = {0, … ,255}. The

bytes that are extracted from input file are padded with zeros to form a vector

(if there are more than bytes in the file, just the first bytes without padding). Each element of vector is fed into a trainable embedding layer to get an embedded vector of eight elements. After this embedding process, one-dimensional vector becomes a matrix Z

8. This matrix Z is then fed into two 1-d convolutional layers. These two layers use Rectified Linear Unit (ReLU) and sigmoidal activation functions respectively. By combining these two layers with gating, the vanishing gradient problem caused by sigmoidal activation functions is avoided. The obtained values are fed to a temporal max-pooling layer followed by a fully connected layer with ReLU activation. The final classification is made based on the output of the last fully connected layer. If their output is greater than 0.5 then the sample is taken as a benign file, otherwise it is classified as malware.

Iii-C Action Space

The action space is based on the PE file layout.

A PE file consists of a number of headers and sections that tell the dynamic linker how to map the file into memory. In general, there are three types of layouts in PE: header, section table and data. Header is a data structure that contains basic information on the executables. Section table describes the characteristics of each file section. The data layout contains the actual data related to each section. Malware usually modifies some of these structures to create malicious activities that are hard to detect. The stealth and sensitivity of these modifications make it harder to alter bytes without breaking the malware functionality. However, some of the sections are not important for the program to run. Some of them are even neglected by OS such as the overlay section. Based on such information the following action space is designed, inspired by Anderson et al. [3]

. There are 10 different actions with trainable parameters. Each action accepts random parameters to test evasion on a gradient-boosted decision-tree model with reinforcement learning. However, this randomness becomes a big issue in MDEA since the evolutionary algorithm already introduces enough generality to the problem, and more randomness would cause the model to not converge. Therefore, a parameter set is used with each action to make the model converge with acceptable time. The actions are:

  • Add a function to the import address table that is never used

  • Manipulate existing section names

  • Append bytes to extra space at the end of sections

  • Create a new entry point

  • Manipulate signature

  • Manipulate debug info

  • Pack the file

  • Unpack the file

  • Modify header checksum

  • Append bytes to overlay section.

Fig. 4: The performance difference between a dead species and normal species. The x-axis is the number of modified bytes. The y-axis is teh detection accuracy. The big gap of the two lines shows how dead species can affect the performance.
Fig. 5: Detection accuracy (in y) improves with number of generations (in x), and eventually flattens out at about 10 generations.
Fig. 6: Detection accuracy (in y) improves with number of modified bytes (in x). The largest improvement is between 8000 bytes to 16,000 bytes, and flattens at the both ends..

Note that some of these actions such as delete a signature are not recoverable. Once the evolutionary optimization algorithm chooses to perform this action, all later generations of this malware will not be able to effectively perform the same action again. This irreversibility causes the diversity to drop drastically. This issue is addressed with more details in Section 4.

Iii-D Dead Species

This section discusses the dead species problem. There are many actions in the action space that are not reversible such as removing signatures and modifying checksum sections. If any of these actions are performed on the malware, a later generation will not be able to reverse it. Since it is unlikely find an optimal action sequence at the beginning of evolution, picking such irreversible actions drastically reduces the search space. The offspring that contain those actions will be stuck at a bad local optimum.

Figure 4 shows an example of such dead species. There is an obvious gap between the normal species and the dead species after 16,000 modified bytes. Investigation of these two evolution showed that the dead species picked “delete-checksum” and “remove signature” actions when number of modified bytes was equal to 15,879. This result validates the conjecture that irreversible actions causes the dead species problem.

In order to solve this problem, validation weights can be introduced into the evolutionary optimization algorithm. These weights represent the probability of each action being picked. The actions that can cause dead species are assigned with a very low probability. This countermeasure worked well in practice and increased the average accuracy by 1%. However, the validation weights only delay the occurrence of dead species, instead of actually solving the problem. Any individuals that pick irreversible actions lead to dead species and lose their diversity in evolution. Further possible techniques for dealing with this problem will be discussed in Section 5.

Iii-E Evolutionary Optimization

A framework called DEAP (Distributed Evolutionary Algorithms in Python) is used to construct evolutionary optimization algorithm. It consists of three major parts: population initialization and evolution, binary modification, and population evaluation (Figure 3).

The population individual evolution part breeds new children using mutation and crossover. Mutation alters one or more gene values in a chromosome. There are many different types, such as shrink mutation[31], uniform mutation and boundary mutation. The “mutShuffleIndexes” method were chosen in DEAP, which shuffles the attributes of the input individual. During the shuffle phase, there is also a probability to replace some of the elements in those attributes with randomly chosen elements. In contrast, crossover combines the genetic information of two parents to generate new offspring. It is one way to stochastically generate new solutions from an existing population. The uniform crossover method was chosen for MDEA, chooses each attribute from either parent with equal probability.

Fig. 7: Detection accuracy as a function of random actions. The x-axis is the number of bytes that is modified by random actions, the y-axis is the detection accuracy

After the population is evolved, the action sequence will be sent to the binary modification section to produce modified malware. The modified malware will be evaluated by the detection model and the statistics for evolving next generation will be calculated. After the evaluation step, the selection method picks the best offspring as the parent for the next generation. For simplicity, the “selectBest” selection algorithm was used in MDEA. This cycle continues until there is enough data for further training, or the generation limit is reached.

Let us denote a set of malware that the detection model detects as , and the set of malware that it does not detect as . The goal of evolution is to find a function such that () for as many as possible. The function is represented by an action sequence (, ), (, ), (, ) … (, ) where , [0..], is an action from the action space and is the corresponding parameter for that action (e.g. for the appending overlay method, indicates how many bytes to append at the end). Evolutionary optimization aims to find by crossover, mutation and selection of such sequences. Let us denote the detection model as D. For any malware sample , if D() 0.5, . Thus, the goal of evolutionary model is to maximize where . With these definitions, the overall algorithm for the evolution is shown in the Algorithm 1.

Iv Experiments

This section will discuss the experimental setup and results. First, the details of the datasets, their division into training and validation sets, and the hardware used to run this experiment are described. The experimental results and their importance are shown in terms of generations, modification sizes and random actions.

Iv-a Experimental Setup

To set up the experiment, 7371 malware samples were collected from VirusShare, a malware sample data website, and 6917 benign samples through web crawling. All these 14,288 samples are Windows Portable Executables (PE). The dataset was divided into two sets with 9:1 ratio as training set and validation set. A extra test set is also used to check the overfitting problem. It consists of 100 malware samples that are never used during training or evolutionary phase.

Both neural network training and evolutionary optimization method were ran on Texas Advanced Computing Center (TACC) Maverick2 server with 4 Nvidia 1080-TI GPUs and 16 Intel Xeon CPUs. The training time for detection network was around 10 hours and the running time for evolutionary optimization was around 24 hours each cycle. The accuracy and the number of modified bytes after each cycle were recorded.

Result: Set R that contains a list of (,A,P), where is a malware sample, A is an action sequence, and P is a parameter set.
1 Parameterization: Malware Train Set M Malware M Pool of Candidates for the next Population F Evaluation Detection Network D Current Population of (A,P,) triples O Evaluation score Population Size Candidate Pool Size Evaluation Score Threshold Generation Limit Modified Malware Current Generation Modification Step ; for  in M do
2       = 0;
3       O = RandomPickAPpairs(S);
4       for (A,P,) indo
5             r = ModifyMalware(A,P,);
6             e = D();
8       end for
9       (A,P,) = FindBestIndividual(O);
10       while  and  do
11             = 0;
12             for   do
13                   (A,P) = CrossOverGoodParents(O,(A,P));
14                   (A,P) = MutateGoodParents((A,P));
15                   = ModifyMalware(A,P,);
16                   = D();
17                   F = append(F,(A,P,));
18                   ++;
20             end for
21             O = PickGoodIndividuals(F,);
22             (A,P,) = FindBestIndividual(O);
23             ++;
25       end while
26      R = append(R,(,A,P);
28 end for
Return R;
Algorithm 1 Malware Sample Evolution

Iv-B Experimental Result

The result graphs are shown in Figure 5 and 6.

Figure 5 shows the detection accuracy of the detection model increases with the number of generations. After 10 generations of training, which is around 12 days, the accuracy increased from 90% to around 93%. Note that there is a drop at generation 4 in the graph, which is suspected that is caused by the dead species issues discussed in Section 4.3.

Figure 6 shows the relation between the detection accuracy and the number of modified bytes. There is a big jump from 8000 bytes to 12,000 bytes. A possible reason is that some of the sections in the PE file have specific length requirements and when the number of modified bytes increases above the thresholds, the modification bytes start to capture more malware patterns, which increases the accuracy by a big margin.

In order to show the accuracy is improved by evolutionary optimization instead of the simple data enrichment,a follow up experiment was conducted. Random actions were performed to the malware instead of the evolution procedure. The resulting samples were then added to the training set. The result is shown in Figure 7. The graph shows that the detection accuracy never rises above 0.912, compared to 0.93 in Figure 7. Therefore, the evolutionary optimization can help to learn the malware patterns.

V Discussion and Future Work

This section explains the design choices for MDEA. It also evaluates how well the current approach worked and what could be improved in future work.

There are several design choices. The first one is the detection model, i.e. the MalConv network[25]. Several malware detection methods were researched such as malware images[20], N-gram, K nearest neighbor (Kaggle 2015), and LSTM sequence model[38]. After testing with the dataset, MalConv achieved the highest detection accuracy, therefore, MalConv was chosen as the detection network.

Another design choice was to use evolutionary algorithm (EA) as the optimization method to generate malware samples instead of GANs or reinforcement learning (RL). EA has several advantages. Existing GANs (GANs and its variants) suffer from training problems such as instability and mode collapse. EA can achieve a more stable training process. GANs usually employ a pre-defined adversarial objective function alternating training a generator and a discriminator[37]

. However, the action space cannot be simply expressed as a single adversarial objective function. EA solves this problem by evolving a population of different adversarial objective functions (different action sequences). Compared to RL, EA does not need to backpropagate the action weights and biases, which makes the code 2-3 times faster in practice. EA is also highly parallelizable compared to RL since it only requires individuals to communicate a few scalars between each other. Finally, EA is also more robust in the perspective of scaling

[32]. It is very easy to extend the action space and other parameters to achieve a different learning outcome. With the above benefits, EA was chosen instead of GANs or RL.

The current setup of MDEA increases detection accuracy from 90% to 93%. Even though the result is promising, there are still some ways in which it can be improved. First, the evolutionary optimization still suffers from the dead species problem. The validation weights introduced in Section 4.3 only delay the occurrence of dead species instead of solving it. However, it may be possible to find an order of action sequence so that it is not necessary to reverse irreversible actions. One possible way to find such an order is to write order rules by hand; another way is to learn them from the data itself. Even if the rules are not perfect, they may help avoid dead species in many cases.

Overfitting of evolutionary optimization is another field for future study. The current solution is to use a unseen test set. However, this solution requires more data gathering time and can only be applied once. Recently, Feldman et al.[7] showed the benefits that multiple classes have on the amount of overfitting caused by reusing development set. With their theory and method, it is possible to extend the malware detection problem to multi-class malware classification problem, and the overfitting of development set can be reduced. The plan is to relabel each malware samples into different malware classes and convert the detection problem into a multi-class classification problem to alleviate overfitting.

Another future plan is to expand the search space for the evolutionary optimization algorithm by defining more modification actions. The number of actions in the action space is one of the key factors to ensure the diversity and generality of the evolutionary optimization. However, because of the fragility and sensitivity of binary EXE code, it is very hard to create new modification action without changing the functionality of the program

The size and generality of the dataset can also be improved further. Since the search space is constrained by the diversity of malware types, adding different kinds of malware into the dataset can potentially improve the optimal detection accuracy. The input size of the detection model can also be further increased. Currently, MDEA takes in two million bytes as the input. There are many malware samples in the dataset that have more than two million bytes and the extra bytes are cut off because of the length limit. The plan is to run MDEA with larger GPU memories, so that more data can be taken into the detection model, which may result in better performance. In addition, it is necessary to run an analysis in the future on how randomly modified malware can influence the detection model. By conducting this validation, the significance of the evolutionary optimization algorithm can be verified.

Vi Conclusion

This paper proposed MDEA, an evolutionary adversarial malware detection model that combines neural networks with evolutionary optimization. An action space is introduced, which contains 10 different binary modification actions. The evolutionary algorithm evolves different action sequences by picking actions from the action space and then tests different action sequences against the detection model. After successfully evolving action sequences that bypass the detection model, all these action sequences are applied to corresponding malware samples to form new training set for the detection model. By training the network, detection accuracy increased a significant margin even with limited computing power.

These results show that deep learning-based malware detection can defend against adversarial attacks and accuracy can be further improved by evolutionary learning. Evolutionary optimization provides generality and diversity that is difficult to achieve by other optimization algorithms.


  • [1] A. Acquisti and J. Grossklags Privacy attitudes and privacy behavior. Economics of Information Security Advances in Information Security, pp. 165––178. External Links: Document Cited by: §I.
  • [2] (2017) Advanced malware detection - signatures vs. behavior analysis. Cited by: §II-A.
  • [3] H. S. Anderson, A. Kharkar, B. Filar, D. Evans, and P. Roth (2018-01) Learning to evade static pe machine learning malware models via reinforcement learning. arXiv preprint arXiv:1801.08917. External Links: 1801.08917 Cited by: §II-C, §II, §III-C.
  • [4] C. Beek, C. Palm, E. Peterson, R. Samani, Schmugar,Craig, R. Sims, D. Sommer, and B. Sun (2018) McAfee labs threats report. Cited by: §I.
  • [5] W. Chen, P. Tsangaratos, I. Ilia, Z. Duan, and X. Chen (2019-05) Groundwater spring potential mapping using population-based evolutionary algorithms and data mining methods. Science of The Total Environment 684, pp. . External Links: Document Cited by: §II-D.
  • [6] J. Drew, M. Hahsler, and T. Moore (2017) Polymorphic malware detection using sequence classification methods and ensembles. EURASIP Journal on Information Security 2017 (1), pp. 2. Cited by: §I.
  • [7] V. Feldman, R. Frostig, and M. Hardt (2019) The advantages of multiple classes for reducing overfitting from test set reuse. External Links: 1905.10360 Cited by: §V.
  • [8] J. B. Fraley and M. Figueroa (2016)

    Polymorphic malware detection using topological feature extraction with data mining

    SoutheastCon 2016. External Links: Document Cited by: §II-A.
  • [9] T. Garfinkel, K. Adams, A. Warfield, and J. Franklin Compatibility is not transparency: vmm detection myths and …. External Links: Link Cited by: §I.
  • [10] D. Gibert (2016) Convolutional neural networks for malware classification. Ph.D. Thesis, MS Thesis, Dept. of Computer Science, UPC. Cited by: §I.
  • [11] I. J. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio (2014) Generative adversarial networks. External Links: 1406.2661 Cited by: §II-C.
  • [12] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio (2014) Generative adversarial nets. In Advances in Neural Information Processing Systems 27, Z. Ghahramani, M. Welling, C. Cortes, N. D. Lawrence, and K. Q. Weinberger (Eds.), pp. 2672–2680. External Links: Link Cited by: §I.
  • [13] W. Hardy, L. Chen, S. Hou, Y. Ye, and X. Li (2016) DL4MD: a deep learning framework for intelligent malware detection. In Proceedings of the International Conference on Data Mining (DMIN), pp. 61. Cited by: §I.
  • [14] O. M. J. Hooman, M. M. Al-Rifaie, and M. A. Nicolaou (2018-Sep.) Deep neuroevolution: training deep neural networks for false alarm detection in intensive care units. In 2018 26th European Signal Processing Conference (EUSIPCO), Vol. , pp. 1157–1161. External Links: Document, ISSN 2219-5491 Cited by: §II-D.
  • [15] (2015) Kaggle:microsoft malware classification challenge (big 2015). Cited by: §II-B, §II.
  • [16] B. Kolosnjaji, A. Demontis, B. Biggio, D. Maiorca, G. Giacinto, C. Eckert, and F. Roli (2018) Adversarial malware binaries: evading deep learning for malware detection in executables. External Links: 1803.04173 Cited by: §I, §II-C.
  • [17] K. Lab (2015) Carbanak apt: the great bank robbery. Securelist. Cited by: §I.
  • [18] A. Martín, H. D. Menéndez, and D. Camacho (2017-12-01) MOCDroid: multi-objective evolutionary classifier for android malware detection. Soft Computing 21 (24), pp. 7405–7415. External Links: ISSN 1433-7479, Document, Link Cited by: §II-D.
  • [19] M. Narouei, M. Ahmadi, G. Giacinto, H. Takabi, and A. Sami (2015) DLLMiner: structural mining for malware detection. Security and Communication Networks 8 (18), pp. 3311–3322. Cited by: §I.
  • [20] L. Nataraj, S. Karthikeyan, G. Jacob, and B.S. Manjunath (2011) Malware images: visualization and automatic classification. In International Symposium on Visualization for Cyber Security (VizSec), Cited by: §V.
  • [21] A. Nguyen, J. Yosinski, and J. Clune (2014) Deep neural networks are easily fooled: high confidence predictions for unrecognizable images. External Links: 1412.1897 Cited by: §II-C.
  • [22] A. Plonk and A. Carblanc (2008) Malicious software (malware): a security threat to the internet economy. Cited by: §I.
  • [23] E. V. Podryabinkin, E. V. Tikhonov, A. V. Shapeev, and A. R. Oganov (2019-02)

    Accelerating crystal structure prediction by machine-learning interatomic potentials with active learning

    Physical Review B 99 (6). External Links: ISSN 2469-9969, Link, Document Cited by: §II-D.
  • [24] M. D. Preda, M. Christodorescu, S. Jha, and S. Debray (2007) A semantics-based approach to malware detection. ACM SIGPLAN Notices 42 (1), pp. 377. External Links: Document Cited by: §II-A.
  • [25] E. Raff, J. Barker, J. Sylvester, R. Brandon, B. Catanzaro, and C. Nicholas (2017) Malware detection by eating a whole exe. External Links: 1710.09435 Cited by: Fig. 1, Fig. 2, §II-B, §III-B, §V.
  • [26] T. Raffetseder, C. Kruegel, and E. Kirda Detecting system emulators. Lecture Notes in Computer Science Information Security, pp. 1–18. External Links: Document Cited by: §I.
  • [27] E. Real, A. Aggarwal, Y. Huang, and Q. V. Le (2018) Regularized evolution for image classifier architecture search. External Links: 1802.01548 Cited by: §II-D.
  • [28] D. K. S. Reddy and A. K. Pujari (2006) N-gram analysis for computer virus detection. Journal in Computer Virology 2 (3), pp. 231–239. Cited by: §I.
  • [29] Z. Rehman, S. N. Khan, K. Muhammad, J. W. Lee, Z. Lv, S. W. Baik, P. A. Shah, K. Awan, and I. Mehmood (2018)

    Machine learning-assisted signature and heuristic-based detection of malwares in android devices

    Computers and Electrical Engineering 69, pp. 828–841. External Links: Document Cited by: §II-B.
  • [30] K. Rieck, P. Trinius, C. Willems, and T. Holz (2011) Automatic analysis of malware behavior using machine learning. Journal of Computer Security 19 (4), pp. 639–668. Cited by: §I.
  • [31] C. C. D. Ronco and E. Benini (2013-12) A simplex-crossover-based multi-objective evolutionary algorithm. Lecture Notes in Electrical Engineering IAENG Transactions on Engineering Technologies, pp. 583–598. External Links: Document Cited by: §III-E.
  • [32] T. Salimans, J. Ho, X. Chen, S. Sidor, and I. Sutskever (2017) Evolution strategies as a scalable alternative to reinforcement learning. External Links: 1703.03864 Cited by: §II-D, §II-D, §V.
  • [33] I. Santos, F. Brezo, J. Nieves, Y. K. Penya, B. Sanz, C. Laorden, and P. G. Bringas (2010-02) Idea: opcode-sequence-based malware detection. Springer, Berlin, Heidelberg. External Links: Link Cited by: §II-A.
  • [34] A. Singh, A. Handa, N. Kumar, and S. K. Shukla (2019) Malware classification using image representation. In CSCML, Cited by: §II-B.
  • [35] A. Souri and R. Hosseini (2018-12) A state-of-the-art survey of malware detection approaches using data mining techniques. Human-centric Computing and Information Sciences 8 (1). External Links: Document Cited by: §II-A.
  • [36] F. P. Such, V. Madhavan, E. Conti, J. Lehman, K. O. Stanley, and J. Clune (2017) Deep neuroevolution: genetic algorithms are a competitive alternative for training deep neural networks for reinforcement learning. External Links: 1712.06567 Cited by: §II-D.
  • [37] C. Wang, C. Xu, X. Yao, and D. Tao (2018) Evolutionary generative adversarial networks. External Links: 1803.00657 Cited by: §V.
  • [38] J. Yan, Y. Qi, and Q. Rao (2018) Detecting malware with an ensemble method based on deep neural network. Security and Communication Networks 2018. Cited by: §I, §II-B, §II, §V.
  • [39] S. R. Young, D. C. Rose, T. P. Karnowski, S. Lim, and R. M. Patton (2015) Optimizing deep learning hyper-parameters through an evolutionary algorithm. Proceedings of the Workshop on Machine Learning in High-Performance Computing Environments - MLHPC 15. External Links: Document Cited by: §II-D.
  • [40] Z. Yuan, Y. Lu, and Y. Xue (2016) Droiddetector: android malware characterization and detection using deep learning. Tsinghua Science and Technology 21 (1), pp. 114–123. External Links: Document Cited by: §II-B.
  • [41] M. Zakeri, F. Faraji Daneshgar, and M. Abbaspour (2015) A static heuristic approach to detecting malware targets. Security and Communication Networks 8 (17), pp. 3015–3027. Cited by: §I.