Behavioral Malware Classification using Convolutional Recurrent Neural Networks

11/19/2018 ∙ by Bander Alsulami, et al. ∙ Drexel University 0

Behavioral malware detection aims to improve on the performance of static signature-based techniques used by anti-virus systems, which are less effective against modern polymorphic and metamorphic malware. Behavioral malware classification aims to go beyond the detection of malware by also identifying a malware's family according to a naming scheme such as the ones used by anti-virus vendors. Behavioral malware classification techniques use run-time features, such as file system or network activities, to capture the behavioral characteristic of running processes. The increasing volume of malware samples, diversity of malware families, and the variety of naming schemes given to malware samples by anti-virus vendors present challenges to behavioral malware classifiers. We describe a behavioral classifier that uses a Convolutional Recurrent Neural Network and data from Microsoft Windows Prefetch files. We demonstrate the model's improvement on the state-of-the-art using a large dataset of malware families and four major anti-virus vendor naming schemes. The model is effective in classifying malware samples that belong to common and rare malware families and can incrementally accommodate the introduction of new malware samples and families.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction and Background

Malware classification is the process of assigning a malware sample to a specific malware family. Malware within a family shares similar properties that can be used to create signatures for detection and classification. Signatures can be categorized as static or dynamic based on how they are extracted. A static signature can be based on a byte-code sequence [24], binary assembly instruction [31], or an imported Dynamic Link Library (DLL) [38]. Dynamic signatures can be based on file system activities [18, 40], terminal commands [43], network communications [26, 46], or function and system call sequences [37, 20, 2].

Behavioral signatures have become a useful complement to static signatures, which can be obfuscated easily and automatically [42]. For example, polymorphic and metamorphic malware mutate their appearance and structure without affecting their behavior [41, 14]. Behavioral features capture run-time information such as file system activities, memory allocations, network communications, and system calls during the execution of a program. Such features make behavioral malware classifiers more resilient to static obfuscation methods.

Each anti-virus vendor has a unique labeling format for malware families. The format often includes the target platform (e.g., Windows, Linux) the malware category (e.g., trojan, worm, ransomware), and an arbitrary character that describes the generation. For example, a malware sample that belongs to the ransomware family Cerber is labeled Ransom:Win32/Cerber.a according to the naming scheme in the Microsoft Windows Defender anti-virus system. Such naming schemes are used to simplify the classification of malware samples, track their evolution, and associate their effective counter-response. The performance of behavioral classification models depends on the ground truth labels assigned by the various anti-virus naming schemes at training. Unfortunately, the naming schemes are inconsistent across anti-virus vendors [35], which complicates the training and evaluation process. This work describes a new malware classification model that performs consistently better than other models described in previous work using various anti-virus ground truth labeling schemes.

This paper presents our contributions to behavioral malware classification using information gathered from Microsoft Windows Prefetch files. We demonstrate that our technique achieves a high classification score on common malware families for a large number of samples. We measure the generalization of our malware classification model on four different anti-virus scan engines. We demonstrate the robustness of our model on rare malware families with small sample sizes. We also evaluate the ability of our model to include the correct malware family in its top predictions. Finally, we present our model’s capacity to learn the behavior of newly discovered malware samples and families.

The paper is organized as follows: Section 3 explains Microsoft Windows Prefetch files, which are used as dynamic features in our model. Section 2 describes previous related work. Section 4 describes the architecture of our behavioral malware classification model. Section 5 explains how the dataset used in the experiment was created from the ground truth labelled data. Section 6 evaluates our model against previous work on behavioral malware classification. Finally, Section 7 outlines our conclusions and future work.

2 Related Work

Behavioral malware classification has been researched extensively to mitigate the shortcomings of static malware classification. Malware that use advanced obfuscation techniques, such as polymorphism and metamorphism, are a challenge for detection and classification using static analysis techniques [15, 29]. Researchers introduced new dynamic features to profile the behavior of malware samples. They extract the program control graphs [25] and measure the similarity between malware within the same family. The work described in [36, 9, 20]

used sequences of function/system calls to model the behavior of malware and applied machine learning techniques to group malware with similar behavior into a common family.

The disparity between anti-virus vendors’ naming schemes affect the performance of behavioral malware classifiers [3, 8, 21]. A common solution is to cluster malware based on their observed behavior using unsupervised machine learning [3]. However, malware samples that are difficult to cluster are often left out [27]. A method to overcome the disparity between anti-virus scan engine labels is to cluster multiple ground truth labels into a single valid ground truth source [34]. Another solution uses a method to aggregate labels in conjunction with and supervised and unsupervised machine learning techniques to infer suitable labels [21].

Our work is distinct from previous efforts in that we build a Convolutional Recurrent Neural Network that uses new dynamic features extracted from Windows Prefetch files to classify malware. The model should outperform previous work using any anti-virus labelling scheme, should perform consistenly regardless of the ground truth labels, and should be able to classify malware into both common and rare malware families.

3 Microsoft Windows Prefetch Files

Prefetch files contain a summary of the behavior of Windows applications. The Windows operating system uses Prefetch files to speed up the booting process and launch time of Windows programs. The Windows Cache Manager (WCM) monitors the first two minutes of the booting process and another sixty seconds after all systems services are loaded. Similarly, WCM continues to monitor the application running for ten seconds. The prefetching process analyzes the usage patterns of Windows applications while they load their dependency files such as dynamic link libraries, configuration files, and executable binary files. WCM stores the information for each application in files with a .PF extension inside the system directory named Prefetch.

Prefetch files store relevant information about the behaviors of the application, which can be used for memory security forensics, system resources auditing, and Rootkit detection [4, 30, 28]. Many malicious activities can leave distinguishable traces in Prefetch files [28, 30]. Even fileless malware, which are memory resident malicious programs, can leave residual trails in Prefetch files after deleting their presence from the file system [13, 19, 7]. Poweliks is one of the first fileless malware samples that can infect a computer with Ransomware [19]. The malware employs evasive techniques to avoid detection from traditional anti-virus software.

Figure 1 shows an example Prefetch file for the CMD.EXE program. The first section has runtime information such as the last-execution timestamp. The second section contains storage information. The third section lists the directories accessed by the program. The final section lists the resource files loaded by the program. The exact format of Prefetch files may vary on different versions of Windows, but the general structure is consistent across all versions. In our model, we only use the list of loaded files from the final section of each Prefetch file.

Figure 1: Example of a Prefetch file for the CMD.EXE program

4 Malware Classification Model

Our model classifies malware into families using information gathered from Prefetch files stored in the Windows Prefetch folder. We use Convolutional Recurrent Neural Networks to implement the components of our classifier. This section describes the architecture of the model and the training process used to create the model.

4.1 Model Architecture

Figure 2

shows the general architecture of our behavioral malware classifier. The first layer is the embedding layer. This layer receives a sequence of resource file names and maps them to embedding vectors of arbitrary sizes. The number of embedding vectors represents the size of the vocabulary of the model. Each file name corresponds to a unique embedding vector. Embedding vectors generally improve the performance of large neural networks for complex learning problems 

[33].

The second layer is a convolutional layer. The layer applies a one dimensional (1D) sequential filter of a particular size. The layer, then, slides the filter over the entire list to extract adjacent file names. This helps the model learn the local relation between embedding vectors. 1D convolutional layers have been used successfully in sequence classification and text classification [23] problems.

The third layer is Max Pooling. This layer reduces the size of the data from the previous layer. It is designed to improve the computational performance and the accuracy of our model and its respective training process. We use the

maximum function to select the important representation out of the data.

The fourth layer is Bidirectional LSTM. Bidirectional LSTM (BiLSTM) is an architecture of recurrent neural networks [16]. Recurrent neural networks learn the long-term dependency between the embedding vectors. In our context, they model the relationship between the resources file names loaded in each Prefetch file. The bidirectional structure consists of a forward and reversed LSTM, which is a structure that has been successful in NLP and sequence classification problems [45, 17].

The fifth layer is Global Max Pooling. This layer propagates only relevant information from the sequence of outputs of BiLSTM. It reduces the size of the output of the BiLSTM layer.

The sixth, and final, layer is Softmax. This layer outputs the probability that a malware sample belongs to a specific malware family.

To improve the generalization of our model, we apply different regularization techniques. First, we apply dropout between our model layers. Dropout is a commonly used technique in training large neural networks to reduce overfitting [39]. Dropout has shown to improve the training and classification performance of large neural networks. The goal is to learn hidden patterns without merely memorizing the training samples in the training data. This improves the robustness of the model on unseen (i.e., zero-day) malware samples.

Figure 2: 1D-Conv-BiLSTM model architecture

5 Experimental Setup

This section described how the dataset and the ground truth labeling used in our experiment was created.

5.1 Dataset Collection

We successfully executed around 100,000 malware samples obtained from the public malware repository VirusShare222VirusShare111, http://www.virusshare.com. Malware samples were deployed on freshly installed Windows 7 executing on a virtual machine. After each Prefetch file is collected, the virtual machine is reset to a clean (non-infected) state. In order for Windows to generate a Prefetch file for malware sample, the sample needs to be executed. Once the sample is loaded, Windows generates a Prefetch file automatically. This simplifies the task of extracting the Prefetch files for malicious programs. Our experiments only included malware samples that produced Prefetch files and were identified by major anti-virus engines, such as Kaspersky, EsetNod32, Microsoft, and McAfee.

Type
Size Malware Family Samples
Adware 0.79% MultiPlug, SoftPulse, DomaIQ
Backdoor 2.25% Advml, Fynloski, Cycbot, Hlux
Trojan 89.18% AntiFW, Buzus, Invader, Kovter
Virus 1.44% Lamer, Parite, Nimnul, Virut
Worm 4.28% AutoIt, Socks, VBNA, Generic
Ransomware 2.07% Xorist, Zerber, Blocker, Bitman
Table 1: Malware types, size, and examples of malware families according to the Kaspersky, EsetNod32, Microsoft, and McAfee

5.2 Ground Truth Labeling

Ground truth labels for malware were obtained through an online third-party virus scanning service called VirusTotal333VirusTotal, http://www.virustotal.com. Given an MD5, SHA1 or SHA256 of a malware file, VirusTotal provides the detection information for popular anti-virus engines. This information also includes meta-data such as target platforms, malware types, and malware families for each anti-virus scan engine. Table 1 illustrates malware types, sample size, examples of malware families according to EsetNod32, Kaspersky, Microsoft, and MacAfee.

6 Evaluation

This section describes the experimental evaluation of our model against a model from previous work.

6.1 Performance Measurements

The classification accuracy of our classification model is measured by the F1 score, F1 demonstrates the trade-off between Recall and Precision and combines them into a single metric range from 0.0 to 1.0. Recall is the fraction of a number of retrieved examples over the number of all the relevant examples. Precision is the fraction of the number of relevant examples over the number of all retrieved ones. The F1 score formula is:

A classifier is superior when its F1 score is higher. We choose the F1 score because it is less prone to unbalanced classes in training data [11]. Malware training datasets often contain unbalanced samples for different malware families. The ratio between malware family sizes sometimes varies 1:100. Table 2 shows malware type, size of malware type, and a few examples of malware families.

Anti-virus label (# of malware families)
Kaspersky (50) EsetNod32 (53) Microsoft (38) McAfee (55) F1 mean
1D-Conv-BiLSTM 0.734 0.854 0.754 0.765 0.777
LR 2-grams 0.711 0.821 0.734 0.756 0.756
LR 3-grams 0.718 0.822 0.726 0.756 0.756
RF 2-grams 0.702 0.792 0.731 0.755 0.745
RF 3-grams 0.671 0.699 0.72 0.724 0.704
Table 2: F1 score for 1D-Conv-BiLSTM, LR (2,3)-grams, and RF (2,3)-grams models using Kaspersky, EsetNod32, Microsoft, and McAfee labelings.

6.2 Classification Performance with Common Malware Families

We evaluate our malware classification model against the model of previous work on behavioral malware classification [9]

. The previous work examined multiple types of feature extractions, feature selections, classification models based on large datasets extracted from sequences of OS system calls. The top performing models were Logistic regression (LR) and Random Forests (RF). LR and RF were used with n-grams feature extraction and

Term Frequency-Inverse Document Frequency (TF-IDF) feature transformation [10]. RF also used Singular Value Decomposition (SVD) for feature dimensionality reduction [22].

We implemented our new model using the Keras and Tensorflow 

[12, 1]deep learning frameworks. We configured our model using the following parameters:

  • Embedding layer: 100 hidden units

  • 1D Convolutional layer: 250 filters, kernel size of five, one stride, and RELU activation function

  • 1D Max Pooling: pool size of four

  • Bidirectional LSTM: 250 hidden units

  • L2 regularization: 0.02

  • Dropout regularization: 0.5

  • Recurrent Dropout regularization: 0.2

We implemented the previous work LR and RF models using Scikit-learn [32]

. We applied a grid search to select the best hyperparameters for the LR and RF models.

We train our model using Stochastic Gradient Descent (SGD) with batch size of 32 samples and 300 epochs 

[47]. SGD is an iterative optimization algorithm commonly used in training large neural networks. SGD can operate on large training sets using one sample or a small batch of samples at a time. Thus, it is efficient for large training sets and for online training [5].

We use a 10-fold cross-validation with stratified sampling to create a balanced distribution of malware samples for malware families in each fold. We train the models on 9 splits of our dataset and test on a separate dataset. We repeat this experiment 10 times and take the average metric score for the final output. We include any malware families that have a minimum of 50 malware samples.

Table 2 shows the F1 score results of our experiment using four major anti-virus scan engines: Kaspersky, EsetNode32, Microsoft, and MacAfee. The results show that our model outperforms all other models using any anti-virus engine labeling. The second best are the LR models, which outperform the RF models on all anti-virus scan engines and reproduce the results described in [9]. It is noteworthy that the 3-gram features extraction usually provides better results than the 2-gram features in the LR models. However, the 2-gram features outperform the 3-gram features in the RF models.

As shown, the performance of behavioral classification models depends on the anti-virus engine labelling scheme used during training. LR 3-grams show a better performance using the Kaspersky and EsetNode32 labelings, while a worse performance using the Microsoft labeling scheme. Moreover, RF 2-grams underperform all LR models except when using the Microsoft naming scheme. The inconsistency of the results leads researchers to use the anti-virus engine that produces the highest classification score. However, our model shows consistent performance across all major anti-virus engines and outperforms previous work on major anti-virus engines.

6.3 Classification Performance with Rare Malware Families

Rare malware families with small sample sizes represent a significant percentage of all malware families. This presents a difficulty for models to extract useful behavioral patterns due to insufficient samples during training. In this experiment, we include any malware family that has at least 10 malware samples. This presents a challenge for classification models because the number of malware families largely increases while, at the same time, the number of malware samples for each family decreases. We aim to show the robustness of our classification model when applied to rare malware families.

Table 3 shows the classification performance of our model against LR and RF models using four anti-virus labeling schemes. The table shows that our model consistently outperforms all other models despite the increased number of malware families with a low sample size. For example, on the EsetNod32 labeling scheme, our model performance decreases only -1.0% when the number of families increases from 53 to 180 families while other models exhibit larger classification performance degradations. Specifically, our model shows the smallest decrease in the classification performance from any anti-virus labeling scheme.

Anti-virus label (# of malware families)
Kaspersky (192) EsetNod32 (180) Microsoft (137) McAfee (209)
F1 Diff (%) F1 Diff (%) F1 Diff (%) F1 Diff (%) F1 mean Diff (%)
1D-Conv-BiLSTM 0.647 -0.088 0.844 -0.010 0.727 -0.027 0.720 -0.045 0.735 -4.25%
LR 2-Grams 0.586 -0.124 0.790 -0.032 0.656 -0.078 0.652 -0.104 0.671 -8.45%
LR 3-Grams 0.594 -0.124 0.790 -0.032 0.651 -0.075 0.656 -0.100 0.673 -8.28%
RF 2-Grams 0.588 -0.114 0.760 -0.031 0.664 -0.067 0.658 -0.097 0.668 -7.73%
RF 3-Grams 0.527 -0.144 0.650 -0.049 0.627 -0.093 0.587 -0.137 0.598 -10.58%
Table 3: F1 score for 1D-Conv-BiLSTM, LR (2,3)-grams, and RF (2,3)-grams models using Kaspersky, EsetNod32, Microsoft, and McAfee labelings. Diff (%) shows the change of the F1 scores from the previous section after adding rare malware families.
Figure 3: Average F1 scores of the log number of malware samples per family for 1D-Conv-BiLSTM, LR 3-grams, and RF 2-grams using EstNod32 ground truth labels.

Figure 3 shows the average F1 scores of malware families for LR 3-grams, RF 2-grams, and 1D-Conv-BiLSTM using EsetNod32 ground truth labels. We study the performance of the behavioral classification models on individual malware families to demonstrate the strength of the classification models on common and rare malware families. As shown, the LR model struggles with rare malware families. However, it outperforms the RF model when the number malware samples in a family increases. Conversely, the RF model performs reasonably on rare malware families, but it underperforms the LR models on common malware families. Ultimately, our 1D-Conv-BiLSTM model outperforms both LR and RF models on almost all common and rare malware families.

6.4 Top Predictions Performance

We also evaluated the capacity of the classification models to find the correct malware family label considering their top k predictions. That is, how the F1 score improves when the top [1,2,…,k] predictions include the correct malware family label. As shown in Figure 4, 1D-Cov-BiLSTM consistently outperforms all of the other models using the top [1,2,…,25] predictions. 1D-Conv-BiLSTM achieves around 0.91, 0.95, and 0.99 F1 on the top 2, 5, and 25 predictions, respectively. This demonstrates that the correct malware family label is usually 99% within the top 25 predictions of our model. The performance of the RF models vary between the (2,3)-grams models, while the LR models achieve similar F1 scores between (2,3)-grams models using top predictions.

Figure 4: The F1 scores for behavioral classification models when top k predictions are used to find the correct malware family label according to EsetNod32 ground truth labeling.

The LR (2,3)-grams models outperform the RF models up to the top 5 predictions. Then, the RF 2-grams model outperforms the LR models on the top 5 or higher predictions. The RF 3-grams model, which achieves the lowest classification performance in our experiment, matches the corresponding LR model performance when considering the top 25 predictions. This shows that RF models have a higher capacity to find the correct malware families within the top candidates. The reason might be related to the fact that a Random Forest is an ensemble of decision trees 

[6], and it is knowns that ensemble models often overcome the limitation of stand-alone classification models [44]. Our model consistently outperforms the LR and RF models on the top k predictions.

6.5 Classification Performance with New Malware Families

Behavioral malware classification models need to learn the behavior of newly discovered malware continuously. This presents a challenge since the rate of malware sample discovery is high. Therefore, it is efficient, and practical, incrementally to train an existing model rather than re-train it from scratch on newly discovered samples. Incremental training provides a practical solution to assimilate new malware behavioral information into the classification models without impacting the classification performance.

In this experiment, we evaluate our pre-trained model’s ability to learn the behavior of new malware samples quickly. We train our model on all malware families that were discovered from 2010-2016. Then, we add malware families that were discovered in 2017 to the training dataset and incrementally retrain the model to create a new classification model. We aim to show that incrementally re-training an existing model is more efficient and adaptive than training a new model from scratch.

Figure 5: The F1 scores for newly trained and incremental trained 1D-Conv-BiLSTM models on the test dataset during training.

Figure 5 shows the classification performance of our models during training. The experiment shows that the incrementally re-trained model achieves a higher F1 at early stages during training than the newly trained model. Therefore, the training process can be shortened to reduce the overhead of training on new malware samples. Moreover, incremental re-training of our model is efficient and recommended over fully re-training the model.

7 Conclusion

We introduce a new behavioral malware classification model for the Microsoft Windows platform. Our model extracts features from the Windows Prefetch files. We show the effectiveness of our classification technique on a large malware collection and ground truth labels from 4 major anti-virus vendors.

We also evaluate our models on rare malware families with a small number of malware samples. Despite the increasing number of malware families, our model still outperforms other state-of-the-art models. Moreover, we demonstrate our model’s ability to continuously learn the behavior of new malware families, which reduces the time and overhead of the training process.

In the future, we would like to improve our ground truth labeling by combining all major scan engine labels to increase the performance and robustness of our classification model. We would also like to test our model on evolving malware families over time.

References

  • [1] M. Abadi, P. Barham, J. Chen, Z. Chen, A. Davis, J. Dean, M. Devin, S. Ghemawat, G. Irving, M. Isard, et al. Tensorflow: A system for large-scale machine learning. In OSDI, volume 16, pages 265–283, 2016.
  • [2] M. Alazab, S. Venkatraman, P. Watters, and M. Alazab.

    Zero-day malware detection based on supervised learning algorithms of api call signatures.

    In Proceedings of the Ninth Australasian Data Mining Conference-Volume 121, pages 171–182. Australian Computer Society, Inc., 2011.
  • [3] M. Bailey, J. Oberheide, J. Andersen, Z. M. Mao, F. Jahanian, and J. Nazario. Automated classification and analysis of internet malware. In International Workshop on Recent Advances in Intrusion Detection, pages 178–197. Springer, 2007.
  • [4] B. Blunden. The Rootkit arsenal: Escape and evasion in the dark corners of the system. Jones & Bartlett Publishers, 2012.
  • [5] L. Bottou. Large-scale machine learning with stochastic gradient descent. In Proceedings of COMPSTAT’2010, pages 177–186. Springer, 2010.
  • [6] L. Breiman. Random forests. Machine learning, 45(1):5–32, 2001.
  • [7] S. D. Candid Wueest and H. Anand. THE INCREASED USE OF POWERSHELL IN ATTACKS. https://www.symantec.com/content/dam/symantec/docs/security-center/white-papers/increased-use-of-powershell-in-attacks-16-en.pdf, 2016. [Online; accessed 10-Jan-2017].
  • [8] J. Canto, M. Dacier, E. Kirda, and C. Leita. Large scale malware collection: lessons learned. In IEEE SRDS Workshop on Sharing Field Data and Experiment Measurements on Resilience of Distributed Computing Systems. Citeseer, 2008.
  • [9] R. Canzanese, S. Mancoridis, and M. Kam. Run-time classification of malicious processes using system call analysis. In Malicious and Unwanted Software (MALWARE), 2015 10th International Conference on, pages 21–28. IEEE, 2015.
  • [10] W. Cavnar. Using an n-gram-based document representation with a vector processing retrieval model. NIST SPECIAL PUBLICATION SP, pages 269–269, 1995.
  • [11] N. V. Chawla. Data mining for imbalanced datasets: An overview. In Data mining and knowledge discovery handbook, pages 853–867. Springer, 2005.
  • [12] F. Chollet et al. Keras (2015), 2017.
  • [13] A. Dove. Fileless malware–a behavioural analysis of kovter persistence. 2016.
  • [14] M. Egele, T. Scholte, E. Kirda, and C. Kruegel. A survey on automated dynamic malware-analysis techniques and tools. ACM Computing Surveys (CSUR), 44(2):6, 2012.
  • [15] E. Filiol. Malware pattern scanning schemes secure against black-box analysis. Journal in Computer Virology, 2(1):35–50, 2006.
  • [16] I. Goodfellow, Y. Bengio, A. Courville, and Y. Bengio. Deep learning, volume 1. MIT press Cambridge, 2016.
  • [17] A. Graves and J. Schmidhuber. Framewise phoneme classification with bidirectional lstm and other neural network architectures. Neural Networks, 18(5):602–610, 2005.
  • [18] K. Heller, K. Svore, A. D. Keromytis, and S. Stolfo.

    One class support vector machines for detecting anomalous windows registry accesses.

    In Workshop on Data Mining for Computer Security (DMSEC), Melbourne, FL, November 19, 2003, pages 2–9, 2003.
  • [19] B. S. R. R. U. INOCENCIO. Doing more with less: A study of fileless infection attacks. https://www.virusbulletin.com/uploads/pdf/conference_slides/2015/RiveraInocencio-VB2015.pdf, SEPTEMBER 30, 2015. [Online; accessed 19-Jan-2017].
  • [20] G. Jacob, H. Debar, and E. Filiol. Behavioral detection of malware: from a survey towards an established taxonomy. Journal in computer Virology, 4(3):251–266, 2008.
  • [21] A. Kantchelian, M. C. Tschantz, S. Afroz, B. Miller, V. Shankar, R. Bachwani, A. D. Joseph, and J. D. Tygar. Better malware ground truth: Techniques for weighting anti-virus vendor labels. In

    Proceedings of the 8th ACM Workshop on Artificial Intelligence and Security

    , pages 45–56. ACM, 2015.
  • [22] H. Kim, P. Howland, and H. Park. Dimension reduction in text classification with support vector machines. In Journal of Machine Learning Research, pages 37–53, 2005.
  • [23] Y. Kim. Convolutional neural networks for sentence classification. arXiv preprint arXiv:1408.5882, 2014.
  • [24] J. Z. Kolter and M. A. Maloof. Learning to detect and classify malicious executables in the wild. Journal of Machine Learning Research, 7(Dec):2721–2744, 2006.
  • [25] C. Kruegel, E. Kirda, D. Mutz, W. Robertson, and G. Vigna. Polymorphic worm detection using structural information of executables. In International Workshop on Recent Advances in Intrusion Detection, pages 207–226. Springer, 2005.
  • [26] W. Lee, S. J. Stolfo, and K. W. Mok. A data mining framework for building intrusion detection models. In Security and Privacy, 1999. Proceedings of the 1999 IEEE Symposium on, pages 120–132. IEEE, 1999.
  • [27] P. Li, L. Liu, D. Gao, and M. K. Reiter. On challenges in evaluating malware clustering. In International Workshop on Recent Advances in Intrusion Detection, pages 238–255. Springer, 2010.
  • [28] C. H. Malin, E. Casey, and J. M. Aquilina. Malware Forensics Field Guide for Windows Systems: Digital Forensics Field Guides. Elsevier, 2011.
  • [29] J. A. Marpaung, M. Sain, and H.-J. Lee. Survey on malware evasion techniques: State of the art and challenges. In Advanced Communication Technology (ICACT), 2012 14th International Conference on, pages 744–749. IEEE, 2012.
  • [30] D. Molina, M. Zimmerman, G. Roberts, M. Eaddie, and G. Peterson. Timely rootkit detection during live response. In IFIP International Conference on Digital Forensics, pages 139–148. Springer, 2008.
  • [31] R. Moskovitch, C. Feher, N. Tzachar, E. Berger, M. Gitelman, S. Dolev, and Y. Elovici. Unknown malcode detection using opcode representation. In Intelligence and Security Informatics, pages 204–215. Springer, 2008.
  • [32] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, et al. Scikit-learn: Machine learning in python. Journal of machine learning research, 12(Oct):2825–2830, 2011.
  • [33] J. Pennington, R. Socher, and C. Manning. Glove: Global vectors for word representation. In

    Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP)

    , pages 1532–1543, 2014.
  • [34] R. Perdisci et al. Vamo: towards a fully automated malware clustering validity analysis. In Proceedings of the 28th Annual Computer Security Applications Conference, pages 329–338. ACM, 2012.
  • [35] C. Raiu. A virus by any other name: Virus naming practices. Security Focus, 2002.
  • [36] K. Rieck, P. Trinius, C. Willems, and T. Holz. Automatic analysis of malware behavior using machine learning. Journal of Computer Security, 19(4):639–668, 2011.
  • [37] A.-D. Schmidt, R. Bye, H.-G. Schmidt, J. Clausen, O. Kiraz, K. A. Yuksel, S. A. Camtepe, and S. Albayrak. Static analysis of executables for collaborative malware detection on android. In Communications, 2009. ICC’09. IEEE International Conference on, pages 1–5. IEEE, 2009.
  • [38] M. G. Schultz, E. Eskin, E. Zadok, and S. J. Stolfo. Data mining methods for detection of new malicious executables. In Security and Privacy, 2001. S&P 2001. Proceedings. 2001 IEEE Symposium on, pages 38–49. IEEE, 2001.
  • [39] N. Srivastava, G. E. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov. Dropout: a simple way to prevent neural networks from overfitting. Journal of Machine Learning Research, 15(1):1929–1958, 2014.
  • [40] S. J. Stolfo, F. Apap, E. Eskin, K. Heller, S. Hershkop, A. Honig, and K. Svore.

    A comparative evaluation of two algorithms for windows registry anomaly detection.

    Journal of Computer Security, 13(4):659–693, 2005.
  • [41] P. Szor. The art of computer virus research and defense. Pearson Education, 2005.
  • [42] M. Venable, A. Walenstein, M. Hayes, C. Thompson, and A. Lakhotia. Vilo: a shield in the malware variation battle. Virus Bulletin, pages 5–10, 2007.
  • [43] K. Wang and S. Stolfo. One-class training for masquerade detection. 2003.
  • [44] M. Woźniak, M. Graña, and E. Corchado. A survey of multiple classifier systems as hybrid systems. Information Fusion, 16:3–17, 2014.
  • [45] Y. Wu, M. Schuster, Z. Chen, Q. V. Le, M. Norouzi, W. Macherey, M. Krikun, Y. Cao, Q. Gao, K. Macherey, et al. Google’s neural machine translation system: Bridging the gap between human and machine translation. arXiv preprint arXiv:1609.08144, 2016.
  • [46] N. Ye and Q. Chen. An anomaly detection technique based on a chi-square statistic for detecting intrusions into information systems. Quality and Reliability Engineering International, 17(2):105–112, 2001.
  • [47] T. Zhang. Solving large scale linear prediction problems using stochastic gradient descent algorithms. In Proceedings of the twenty-first international conference on Machine learning, page 116. ACM, 2004.