I Introduction
Machine learning algorithms have been widely deployed in different applications as automated decisionmaking tools due to their generalization capabilities. This includes autonomous driving, and cybersecurity applications where false decisions can have serious consequences. However, machine learning algorithms have shown to be vulnerable in adversarial settings. An adversary can influence the decisions of a machine learning model both in the training phase and in the test phase.
In this work, we address the problem of adversarial examples in the test phase. This is where an adversary modifies original samples to misguide a trained classifier. Modifications cannot be arbitrary, and must preserve the functionalities of original samples. Most of the existing works addressing this problem have been in the area of image classification. In image classification, adversarial examples are generated by adding perturbations to normal images; perturbations must be imperceptible to human eyes to be considered as functionalitypreserving. There has also been some works in the area of malware detection which is also our considered application in this work. In malware detection, adversarial examples are generated by modifying malware samples to be misclassified as benign samples; modifications need to satisfy a set of constraints defined to preserve the functionalities of original malware samples.
Adversarial examples can be classified based on the adversary specificity into targeted or nontargeted examples [1]. Targeted examples are generated to be misclassified into a specific wrong class. Nontargeted examples are generated to be misclassified into an arbitrary wrong class. Adversarial examples can also be classified based on the adversary knowledge of the classifier into whitebox, and blackbox examples [1]. Whitebox examples are generated by an adversary which has access to all the parameters of the target trained classifier [2, 3, 4, 5, 6, 7]. In whitebox scenarios, adversarial examples are generated using gradientbased methods where the gradient can be computed by knowing all the parameters of the classifier. Blackbox examples are generated by an adversary which does not have access to the parameters of the target trained classifier; the adversary has access to only the output of the classifier in the form of either soft decision (confidence score) [8, 9] or hard decision (label) [10, 11, 12, 13, 14, 15, 16]. In blackbox scenarios where the adversary has access to the harddecision output of the classifier, adversarial examples are mainly generated by first constructing a substitute model for the target model, and then using whitebox methods for the substitute model. This idea is based on the transferability of adversarial examples [17] which says that adversarial examples constructed to misguide a specific classifier can also misguide another classifier with a totally different architecture.
As the final objective, we are trying to improve the robustness of machine learning algorithms by taking into account adversarial examples. There are some existing defense mechanisms to make a classifier more robust to adversarial examples such as defense distillation
[18], and adversarial training [19, 20, 21]. Distillation helps the classifier to generalize better to slightly modified samples, and consequently becomes more robust to adversarial examples. In adversarial training, adversarial examples are generated and utilized during the retraining process. Both whitebox and blackbox examples are considered in the training process. Therefore, the classifier gets exposed to such examples during the training process.Ia Existing Works and Contributions
In this work, we address the problem of adversarial examples in a blackbox scenario where the adversary has access to the feature space and the harddecision output of the target classifier. We propose an approach to generate adversarial examples using the minimum description length (MDL) principle.
We consider the application of static malware detection in portable executable (PE) files where their API calls are used to decide whether they are benign or malicious. As mentioned earlier, in malware detection, adversarial examples are generated by making functionalitypreserving modifications to original malware samples such that they are misclassified as benign samples. Hu and Tan [14] addressed this application using a dataset consisting of 160 different API calls. Their approach is based on constructing a substitute model for the target classifier and using generative adversarial networks [22]. We address this application using a dataset consisting of a much larger number of features; our dataset consists of 22761 unique API calls.
In our MDLbased approach, we first create a dataset of samples all identified as benign samples by querying the target classifier. We then construct a code table of frequent patterns for the compression of samples in this benign dataset using the MDL principle. We finally generate an adversarial example corresponding to a malware sample by selecting and adding a pattern from the benign code table to the malware sample. The selected pattern is the one that minimizes the length of the compressed adversarial example given the benign code table. Note that, in our method, we do not construct a substitute model as we only need a dataset of benign samples. Also, our method preserves the functionalities of malware samples as only some new API calls are added to malware samples without removing any existing ones. Considering a neural network as the classifier, using our method, the evasion rate for adversarial examples is 78.24 percent compared to 8.16 percent for original malware samples. This shows the necessity for considering these generated adversarial examples in rebuilding the neural network.
Ii Static Malware Detection
Malware detection is one of the areas that machine learning algorithms have been able to contribute. Traditional algorithms for malware detection search for known patterns which requires them to have a copy of all malware samples. These algorithms are not effective nowadays as (i) polymorphism is used within a malware family, (ii) the number of new malware families is increasingly growing, and (iii) they are not capable of zeroday malware detection. This makes machine learning algorithms good candidates for automated malware detection. This is because they can extract complex patterns using different attributes of a malware, and they can also help with zeroday malware detection as they can generalize to new samples [23].
Malware detection can be divided into two main categories of dynamic (behavioral) and static (code) malware detection. In dynamic malware detection, samples are executed, and their runtime behavior is monitored to create indicators of malicious activities. In static malware detection, binary codes of samples are examined without executing them to create indicators of malicious activities.
As mentioned earlier, we consider static malware detection in PE files. Different types of features have been used for this task such as API calls [6]
, bytelevel Ngrams
[24], features from the PE header [25], and a combination of different types of features [26]. We consider API calls of PE files to distinguish between malware and benign samples. The presenceabsence of API calls forms a transactional dataset which is explained in the following section.Iii Transactional Datasets
In this section, we present some preliminaries for transactional datasets required in this work. A transactional dataset, denoted by , is a nonempty multiset (bag) of transactions, i.e., . Each transaction is a subset of where represents the set of all items (i.e., ). We say that a transaction supports an itemset (which is also a subset of ) if . The support of an itemset , denoted by , is the number of transactions that support the itemset. Considering that is the multiset of transactions that support the itemset , and is the multiset of transactions that support the itemset , we therefore have

,

,

If , then ,

If , then ,
where denotes the cardinality of the multiset. An itemset is considered to be frequent if its support is greater than or equal to a userdecided threshold, denoted by . A frequent itemset is closed if it has no superset with the same support.
Iv MDL Principle and Its Applications
In this section, we present the MDL principle, and its applications for classification, and pattern summarization.
Iva MDL Principle
Kolmogorov complexity theory, also known as algorithmic information theory, was developed to measure the information in objects in isolation, i.e., without knowing the distribution underlying the object. As in data mining, we normally do not know the underlying distribution of our data, we use algorithmic information theory to measure the information in our data. The Kolmogorov complexity of an object is the descriptive complexity of that object which is the length of the shortest computer program that can describe the object. This is formally defined as follows [27].
Definition 1
The Kolmogorov complexity of an object with respect to a universal computer , denoted by , is defined as
which is the minimum length over all programs that print and halt.
However, we cannot compute the Kolmogorov complexity of an object. Therefore, in practice, the MDL principle is utilised. Using the crude MDL version, we choose a model from a set of models, , that minimises the twoterm objective function where is the number of bits required to describe the object given the model, and is the number of bits required to describe the model itself. Hence, based on the crude MDL, we have
IvB MDLbased Classifier
We here explain how to utilize the MDL principle to build a binary classifier. Supervised learning consists of two phases of training and test. In the training phase, we select a model for the training dataset of each class based on the MDL criterion,
In the test phase, if for the transaction , we have
this implies that
Consequently, we classify the sample under the second class. Otherwise, we classify it under the first class. Note that the term in the crude MDL criterion prevents the model to be overfitted during the training phase. This is because, by using a complex (overfitted) model, we can minimize the term . Therefore, using only this term as the selection criterion can result in overfitting. By considering both and terms in the selection criterion, this scenario can be avoided.
IvC MDLbased Pattern Summarization
The MDL principle can be used for pattern summarization where we want to select a small subset of an existing large set of candidate patterns denoted by . In this part, we present the algorithm proposed by Vreeken et al. [28] which uses the MDL principle for pattern summarization. This algorithm performs pattern summarization by searching among code tables of patterns as the family of models to describe the data. A code table, denoted by , has two columns: the first column consists of selected patterns, and the second column consists of binary codes used to encode the patterns in the first column. This algorithm, which basically outputs a semiadaptive compression dictionary, selects the best code table as
(1) 
In the algorithm proposed by Vreeken et al. [28]
, as the search space for constructing code tables is very large, a heuristic approach is used to select the best code table. This heuristic approach consists of three steps. In the first step, candidate patterns in the set
are ordered descending first by their support, second by their length. In the second step, a standard code table consisting of all singleton items is constructed. In the third step, candidate patterns from the ordered are examined one by one. In this step, if adding a candidate pattern to the current code table results in a smaller objective function, i.e., , it is kept in the code table, otherwise it is dropped. This leads to keeping only a small subset of in the final code table. The final code table is considered as the selected model by the MDL principle, and the patterns in the final code table are considered as the patterns chosen by the MDL principle.We here explain how the two terms and in equation (1) are calculated. The first term in equation (1), , is calculated as
where is the length of the binary code for the pattern in the second column, and is the set of patterns used to cover . The patterns covering a transaction satisfy the following properties
and
As there can be several ways (different sets of patterns) to cover a transaction, the patterns in the code table are ordered descending first by their length, next by their support; the patterns are selected according to this order to cover a transaction.
The lengths of binary codes in the second column of the code table, i.e., , are determined by the Shannon code which is a prefix code. The more a pattern used in the cover of transactions, the shorter its code. Therefore, by defining the usage of a pattern as
the code for the pattern is of length
The second term in equation (1), , is calculated as
(2) 
where is the number of times that item appears in the patterns in the first column of the code table. The number of all possible items in first column of the code table considering a separator between each two patterns is . The first two terms on the lefthand side of equation (IVC) correspond to encoding the first column of the code table. The last term on the lefthand side of equation (IVC) corresponds to encoding the second column of the code table consisting of prefix binary codes.
IvC1 Example
We here provide an example for pattern summarization. In this example, we consider the following dataset which consists of five items and 10 transactions.
1  2  3  4  5 

1  1  1  1  0 
1  1  1  1  0 
1  1  0  1  0 
0  1  1  1  1 
0  0  1  1  1 
0  0  0  1  1 
0  1  0  0  0 
0  0  1  0  0 
0  0  0  1  0 
0  0  0  0  1 
Each row represents a transaction. This dataset can be represented as
where is a multiset, and the superscript for an element shows the multiplicity of that element. We perform closed frequent pattern mining (CFPM) with to extract all closed frequent patterns (CFPs) of this dataset. This is to form the list of candidate patterns required to construct an MDLbased code table for this dataset. In this work, we use the Linear Time Closed Itemset Mining (LCM) algorithm [29] for CFPM. Using extracted CFPs, the ordered list of candidate patterns is
7  
5  
5  
4  
4  
4  
3  
3  
3  
2  
2  
1 
and the final code table using the described approach is
binary code length  

This shows the effectiveness of the MDL principle for pattern summarization. In the second column of the code table, we have provided the lengths of binary codes than binary codes themselves. This is because the lengths are important than the codes themselves. Note that item 1 does not appear in the cover of any transactions, i.e., its usage is equal to zero. We keep all singleton items in the final code table by giving them a small usage when their usage is zero. This is to be able to cover any unseen transactions.
V MDLBased Model Selection
In this section, we explain our recently proposed MDLbased modelselection method [30], shown in Fig. 1. The method used for the example in Section IVC can be computationally very expensive for large datasets. This is because we may face pattern explosion in extracting all CFPs. To address this problem, we have recently proposed a method where we use clustering in conjunction with CFPM to form the list of candidate patterns. We have shown that this approach extracts a subset of all CFPs by giving priority to longer CFPs. This is important as the compression is mainly achieved through longer patterns. In our method, we first cluster the dataset. We use the Clustering with sLOPE (CLOPE) algorithm [31] which is a fast algorithm for clustering transactional datasets. In the CLOPE algorithm, we do not need to know the number of clusters in advance, but we need to set the maximum number of clusters. This can be decided based on the parameter . The larger the parameter , the smaller the maximum number of clusters. For a large , we do not face pattern explosion, and therefore we do not need clustering. After clustering, we rank clusters according to the following criterion
(3) 
where and are the height and the number of transactions of cluster respectively. The height of cluster is defined as
The cluster quality takes a value between zero and one. It is equal to one where all the transactions of a cluster are the same (the highest quality). We next select a subgroup of clusters as highquality (HQ) clusters by setting a quality threshold, and perform CFPM in only HQ clusters. In HQ clusters, transactions share majority of their items, and as a result the number of CFPs in these clusters is not large even by considering a small . Lowquality (LQ) clusters are the main reason for pattern explosion, and the output of CFPM in these clusters consists of mainly short patterns.
As the output of the patternmining stage, we take the union over the outputs of CFPM in HQ clusters. We finally construct a code table of patterns according to Section IVC as the selected model.
Vi Proposed MDLBased Adversarial Examples
In this section, we propose an MDLbased method to generate adversarial examples. In our recent work [30], we designed an MDLbased classifier where we showed that one of the main advantages of using the MDL principle for the task of classification is about interpretability. This means that, to some extent, we can explain the reasons why a sample is classified under a specific class rather than other classes. As we saw in Section IVB
, the better a sample is compressed using the model of a class, the higher the probability that the sample belongs to that class. Therefore interpretability is about finding the reasons why a sample can be compressed better using the model of a specific class. This can be done by considering the structure of the family of the models chosen for compression. This motivates utilizing the MDL principle to define a metric to generate adversarial examples. By knowing the reasons why a sample can be compressed better using a model, it is possible to modify the sample to have a shorter compressed version considering the model for a wrong class. This increases the probability that the sample is classified under the wrong class.
We consider a blackbox scenario where the adversary has access to only the input and the harddecision output of the target classifier. In our method, we first select a class, and construct a dataset in which all samples are classified under the selected class by the target classifier. This is done by querying the target classifier. We then choose a model that best describes the dataset using the MDL principle,
(4) 
Now considering that is a sample not belonging to the selected class, we generate an adversarial example corresponding to based on the following metric
(5) 
where represents the set of all modified versions of with the same functionalities. In our approach, we are actually trying to choose the vector from which has the maximum . This is to misguide the classifier to identify the adversarial example as a member of the selected class.
In the application of malware detection, adversarial examples are generated by modifying malwares samples to be misclassified as benign samples. Therefore, we first need to construct a dataset of samples all identified as benign by the target classifier. We then, using (4), choose a model that best describes the benign dataset. We finally, using (5), generate adversarial examples corresponding to original malware samples.
Note that in our proposed approach, we do not construct a substitute model for the original classifier. This is as opposed to blackbox methods in which, first, a substitute model is constructed, and then, a whitebox method is employed to generate adversarial examples for the substitute model. For instance, in the application of malware detection, for creating a substitute model, we need a dataset of malware samples in addition to a dataset of benign samples. In our approach, we only need a dataset of benign samples which is easier to construct than a dataset of malware samples.
Via Algorithm
We here propose an algorithm as a suboptimal implementation of our approach for the application of static malware detection in PE files. API calls of PE files are used as distinguishing features. As mentioned in Section II, the presenceabsence of API calls forms a transactional dataset. In order to preserve the functionalities of original malware samples, we define . This allows us to only add some API calls to a malware sample without removing any existing ones in order to generate an adversarial example.
In our algorithm, shown as Algorithm 1, we first create a dataset of samples all identified as benign samples by the target classifier, denoted by . We next construct a code table of patterns, , for this dataset using the MDL principle, as described in Section V. We finally generate an adversarial example for a malware sample by selecting a pattern from the final code table , and adding it to the malware sample. The selected pattern is the one that minimizes where .
In our algorithm, we do not need to search among singleton patterns in . This is because adding a singleton item can only lead to a smaller compared to the original if it forms a longer pattern with some of the existing items. As we check all nonsingleton patterns, therefore we do not need to check singleton patterns. This is helpful as we normally have a small number of nonsingleton patterns in the final code table , and consequently our search space is much smaller compared to considering all patterns.
ViB Example
We here provide an example for our method. In this example, we consider the dataset in the example of Section IVC as the dataset of benign samples constructed by querying the target classifier, i.e.,
Now let assume that we have a malware sample , and we are going to generate an adversarial example corresponding to this sample. Considering the code table for this dataset, presented in Section IVC, we can see that , but by adding to this sample, i.e., , we have . As discussed in the last section, we only need to search among nonsingleton patterns which is only one pattern. This example confirms that this can make our search space much smaller. We can see that the same result is achieved by adding the singleton pattern to the original sample. This is only because this item together with the existing items form the longer pattern . Therefore we do not need to check this pattern considering nonsingleton patterns.
Vii Performance Evaluation
In this section, we evaluate our proposed algorithm, described in Section VIA, for the application of static malware detection in PE files where API calls are used as features.
Viia Dataset
We use the dataset provided by AlDujaili et al. [6]. Our dataset consists of 14772 benign training samples, 14772 malware training samples, 4924 benign test samples, and 4924 malware test samples. The total number of API calls in the dataset is 22761. Therefore, each sample of the dataset is a binary sequence of size 22761 where the locations of ones determine API calls of that sample.
ViiB Neural Network and Its Performance
We use fully connected feedforward neural networks to find the stateoftheart performance for our dataset. We use fivefold cross validation to optimize hyperparameters of our network. Our network consists of five layers: one input layer of size 22761, three hidden layers of size 300, and one output layer of size two. Rectified linear unit (ReLU) is used as the activation function in the hidden layers, and softmax function is used in the output layer. We use the drop out rate of 50 percent to avoid overfitting. The size of minibatches is 100 samples, the learning rate of Adam optimizer is 0.0001, and the number of epochs is 50. The accuracy, false positive rate (FPR), and false negative rate (FNR) obtained by this network are 91.94, 7.96, and 8.16 percent respectively. This means that the evasion rate for malware samples is 8.16 percent.
ViiC Evasion Rate of Adversarial Examples
We use the trained neural network presented in the last section to test our proposed algorithm for constructing adversarial examples. To construct the benign dataset required in our algorithm, we use the benign test dataset consisting of 4924 samples, and remove the ones that are identified as malware by the trained neural network. This dataset is then used to construct the code table required for generating adversarial examples. Note that benign test samples are independent of benign training sample used to train our target neural network.
To create a list of candidate patterns required for code table construction, we can directly use the LCM algorithm [29] to extract all CFPs in . However, we face pattern explosion in our dataset considering a small . To avoid pattern explosion, we use our recently proposed approach [30] presented in Section V. We have shown that our approach acts as a patternsummarization method by giving priority to longer patterns and without requiring to extract all CFPs. Considering our approach, we cluster using the CLOPE algorithm [31] with repulsion factor equal to four, and maximum cluster number equal to 16. In the CLOPE algorithm, repulsion factor is used to control intracluster similarity. Larger repulsion factor leads to clusters in which transactions share more common items. The clustering provides us with 16 clusters of qualities 0.20, 0.51, 0.71, 0.91, 0.75, 0.87, 0.51, 0.15, 0.50, 0.37, 0.31, 0.35, 0.34, 0.86, 0.29, and 0.08. We consider only the cluster with quality 0.08 as a lowquality cluster, and consider the remaining 15 clusters as highquality clusters. We then apply the LCM algorithm to highquality clusters separately with . The list of candidate patterns is created by taking the union over the outputs of the LCM algorithm for highquality clusters.
After creating the list of candidate patterns, we construct as described in Section IVC. We finally generate one adversarial example corresponding to each malware test sample. This is done by selecting and adding a pattern from to each malware test sample. The selected pattern is the one that minimizes the length of the compressed adversarial example given . The new evasion rate for adversarial examples is 78.24 percent which shows the effectiveness our algorithm.
Viii Discussion
In this section, we present some discussion on the properties of our proposed method for generating adversarial examples, and also on the adversarialtraining defense mechanism.
As discussed in Section VI, one of the main properties of our method is that we do not need to build a substitute model for the target classifier. This makes our method more practical in the scenarios where it is difficult to collect samples for all the existing classes to build a substitute model. In our method, to generate an adversarial example corresponding to a sample, we only need a dataset of samples for a wrong class. Another property of our method is that it is a general method, and can be used in different applications. Equations (4) and (5) presented in Section VI are the two key equations in our method. These equations can be made specific to a particular application. In this work, we have done this for the application of malware detection using their API calls. This is by choosing a specific family of models in the MDL principle, defining a set of constraints to preserve the functionalities of original samples, and an algorithm for finding the minimum of equation (5).
After generating adversarial exmaples, as mentioned in the introduction, one of the main defense mechanisms is adversarial training[19, 20, 21]. In adversarial training, both normal and adversarial examples are considered during the training process, i.e., a training dataset augmented by adversarial examples. The training dataset is augmented with both whitebox and blackbox adversarially generated examples. However, this method can be considered as a brute force method [32], and has not been quite successful in improving the robustness of classifiers. As discussed by Ross and DoshiVelez [32], we also think that explainability/interpretability for a classifier can help us to improve its robustness in adversarial settings. We think that explainability can make adversarial training more successful by guiding us to add specific adversarial examples during the training process. Methods to interpret machine learning models are classified into two classes of intrinsic and post hoc methods [33]. Intrinsic interpretability is when a machine learning model itself is interpretable due to its structure. Post hoc interpretability is when a method is developed to interpret the decisions of a machine learning model after its training. Machine learning models that are intrinsically interpretable can also be used as a post hoc method by approximating the main model in order to explain its decisions. In our recent work, we have shown that we can use the MDL principle to build an intrinsically interpretable classifier [30].
Ix Conclusion
We proposed a method to generate adversarial examples using the minimum description length (MDL) principle. This is to improve the robustness of classifiers by considering these examples in their design process. We assumed that the adversary has access to only the feature set, and the final harddecision output of the target classifier. We evaluated our method for the application of static malware detection in portable executable (PE) files. In malware detection, adversarial examples are generated by making functionalitypreserving modifications to original malware samples to be misclassified as benign samples. Our method requires only a dataset of samples all identified as benign samples by the target classifier. This can be constructed by querying the target classifier. We considered a neural network to detect malware samples in PE files using their API calls. Considering API calls, a feature vector is a binary vector where the locations of ones determine existing API calls. We showed that the evasion rate is 78.24 percent for adversarial examples compared to 8.16 percent for original malware samples. This was done without changing the functionalities of malware samples.
References

[1]
X. Yuan, P. He, Q. Zhu, and X. Li, “Adversarial examples: Attacks and defenses for deep learning,”
IEEE Trans. Neural Net. Learning Sys., Early Access, 2019.  [2] I. J. Goodfellow, J. Shlens, and C. Szegedy. (2015, Mar. 20) Explaining and harnessing adversarial examples. [Online]. Available: https://arxiv.org/abs/1412.6572v3

[3]
A. Kurakin, I. J. Goodfellow, and S. Bengio, “Adversarial machine learning at scale,” in
Proc. Int. Conf. Learn. Represent. (ICLR), Toulon, France, Apr. 2017.  [4] N. Papernot, P. McDaniel, S. Jha, M. Fredrikson, Z. B. Celik, and A. Swami, “The limitations of deep learning in adversarial settings,” in Proc. IEEE European Symp. on Security and Privacy (EuroS&P), Saarbrucken, Germany, Mar. 2016, pp. 372 – 387.

[5]
S.M. MoosaviDezfooli, A. Fawzi, and P. Frossard, “DeepFool: A simple and
accurate method to fool deep neural networks,” in
Proc. IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
, Las Vegas, USA, June 2016, pp. 2574 – 2582.  [6] A. AlDujaili, A. Huang, E. Hemberg, and U. M. O’Reilly, “Adversarial deep learning for robust detection of binary encoded malware,” in Proc. IEEE Security and Privacy Workshops (SPW), San Francisco, USA, May 2018, pp. 76 – 82.
 [7] K. Grosse, N. Papernot, P. Manoharan, M. Backes, and P. McDaniel, “Adversarial examples for malware detection,” in Proc. European Symp. Research Comp. Security (ESORICS), Oslo, Norway, Sep. 2017.
 [8] A. N. Bhagoji, W. He, B. Li, and D. Songi. (2017, Dec. 27) Exploring the space of blackbox attacks on deep neural networks. [Online]. Available: https://arxiv.org/abs/1712.09491v1
 [9] W. Xu, Y. Qi, and D. Evans, “Automatically evading classifiers: A case study on pdf malware classifiers,” in Proc. Network and Distributed System Security Symposium (NDSS).
 [10] N. Papernot, P. McDaniel, I. J. Goodfellow, S. Jha, Z. B. Celik, and A. Swami. (2016, May 24) Practical blackbox attacks against machine learning. [Online]. Available: https://arxiv.org/abs/1605.07277v1
 [11] Z. Zhao, D. Dua, and S. Singh. (2018, Feb. 23) Generating natural adversarial examples. [Online]. Available: https://arxiv.org/abs/1710.11342v2
 [12] C. Xiao, B. Li, J. Zhu, W. He, M. Liu, and D. Song. (2019, Feb. 14) Generating adversarial examples with adversarial networks. [Online]. Available: https://arxiv.org/abs/1801.02610v5

[13]
H. S. Anderson, J. Woodbridge, and B. Filar, “DeepDGA: Adversariallytuned
domain generation and detection,” in
Proc. ACM Workshop on Artificial Intelligence and Security
.  [14] W. Hu and Y. Tan. (2017, Feb 20) Generating adversarial malware examples for blackbox attacks based on GAN. [Online]. Available: https://arxiv.org/abs/1702.05983v1
 [15] I. Rosenberg, A. Shabtai, L. Rokach, and Y. Elovici. (2018, June 24) Generic blackbox endtoend attack against state of the art API call based malware classifiers. [Online]. Available: https://arxiv.org/abs/1707.05970v5
 [16] B. F. H. S. Anderson, A. Kharkar, “Evading machine learning malware detection,” in Proc. Black Hat, Las Vegas, USA, July 2017.
 [17] N. Papernot, P. McDaniel, and I. J. Goodfellow. (2016, May 24) Transferability in machine learning: from phenomena to blackbox attacks using adversarial samples. [Online]. Available: https://arxiv.org/abs/1605.07277v1
 [18] N. Papernot, P. McDaniel, X. Wu, S. Jha, and A. Swami, “Distillation as a defense to adversarial perturbations against deep neural networks,” in Proc. IEEE Symp. Security and Privacy (S&P), San Jose, USA, May 2016, pp. 582–597.
 [19] R. Huang, B. Xu, D. Schuurmans, and C. Szepesvari. (2016, Jan. 16) Learning with a strong adversary. [Online]. Available: https://arxiv.org/abs/1511.03034v6
 [20] A. Kurakin, I. J. Goodfellow, and S. Bengio. (2017, Feb. 11) Adversarial machine learning at scale. [Online]. Available: https://arxiv.org/abs/1611.01236v2
 [21] F. Tramèr, A. Kurakin, N. Papernot, I. J. Goodfellow, D. Boneh, and P. McDaniel. (2018, July 22) Ensemble adversarial training: Attacks and defenses. [Online]. Available: https://arxiv.org/abs/1705.07204v4
 [22] I. J. Goodfellow, J. PougetAbadie, M. Mirza, B. Xu, D. WardeFarley, S. Ozair, A. Courville, and Y. Bengio, “Generative adversarial nets,” in Proc. Neural Information Processing Systems (NIPS), Montreal, Canada, Dec. 2014, p. 2672–2680.
 [23] M. G. Schultz, E. Eskin, E. Zadok, and S. J. Stolfo, “Data mining methods for detection of new malicious executables,” in Proc. IEEE Symp. Security and Privacy (S&P), Oakland, USA, May 2001, pp. 38–49.
 [24] J. Z. Kolter and M. A. Maloof, “Learning to detect and classify malicious executables in the wild,” Journal of Machine Learning Research, vol. 7, no. 1, pp. 2721–2744, Dec. 2006.
 [25] M. Z. Shafiq, S. M. Tabish, F. Mirza, and M. Farooq, “Peminer: Mining structural information to detect malicious executables in realtime,” in Proc. Int. Symp. Recent Advances in Intrusion Detection (RAID), SaintMalo, France, Sep. 2009, pp. 121–141.
 [26] H. S. Anderson and P. Roth. (2018, Apr. 16) Ember: An open dataset for training static pe malware machine learning models. [Online]. Available: https://arxiv.org/abs/1804.04637v2
 [27] T. A. Cover and J. A. Thomas, Elements of Information Theory. John Wiley & Sons, 2006.
 [28] J. Vreeken, M. V. Leeuwen, and A. Siebes, “Krimp: mining itemsets that compress,” Data Min. Knowl. Disc., vol. 23, no. 1, pp. 169–214, July 2011.
 [29] T. Uno, T. Asai, Y. Uchida, and H. Arimura, “An efficient algorithm for enumerating closed patterns in transaction databases,” in Proc. 7th international conference on discovery science, Padova, Italy, Oct. 2004, pp. 16–31.
 [30] B. Asadi and V. Varadharajan. (2019, Oct. 9) An MDLBased classifier for transactional datasets with application in malware detection. [Online]. Available: https://arxiv.org/abs/1910.03751v1
 [31] Y. Yang, X. Guan, and J. You, “Clope: A fast and effective clustering algorithm for transactional data,” in Proc. Eighth ACM SIGKDD Conf. Knowledge Discovery and Data Mining (KDD), Edmonton, Canada, July 2002, pp. 682–687.
 [32] A. S. Ross and F. DoshiVelez. (2017, Nov. 27) Improving the adversarial robustness and interpretability of deep neural networks by regularizing their input gradients. [Online]. Available: https://arxiv.org/abs/1711.09404v1
 [33] C. Molnar, Interpretable Machine Learning, 2019, https://christophm.github.io/interpretablemlbook/.
Comments
There are no comments yet.