An MDL-Based Classifier for Transactional Datasets with Application in Malware Detection

10/09/2019
by   Behzad Asadi, et al.
0

We design a classifier for transactional datasets with application in malware detection. We build the classifier based on the minimum description length (MDL) principle. This involves selecting a model that best compresses the training dataset for each class considering the MDL criterion. To select a model for a dataset, we first use clustering followed by closed frequent pattern mining to extract a subset of closed frequent patterns (CFPs). We show that this method acts as a pattern summarization method to avoid pattern explosion; this is done by giving priority to longer CFPs, and without requiring to extract all CFPs. We then use the MDL criterion to further summarize extracted patterns, and construct a code table of patterns. This code table is considered as the selected model for the compression of the dataset. We evaluate our classifier for the problem of static malware detection in portable executable (PE) files. We consider API calls of PE files as their distinguishing features. The presence-absence of API calls forms a transactional dataset. Using our proposed method, we construct two code tables, one for the benign training dataset, and one for the malware training dataset. Our dataset consists of 19696 benign, and 19696 malware samples, each a binary sequence of size 22761. We compare our classifier with deep neural networks providing us with the state-of-the-art performance. The comparison shows that our classifier performs very close to deep neural networks. We also discuss that our classifier is an interpretable classifier. This provides the motivation to use this type of classifiers where some degree of explanation is required as to why a sample is classified under one class rather than the other class.

READ FULL TEXT
research
12/11/2019

Towards a Robust Classifier: An MDL-Based Method for Generating Adversarial Examples

We address the problem of adversarial examples in machine learning where...
research
02/10/2020

Feature-level Malware Obfuscation in Deep Learning

We consider the problem of detecting malware with deep learning models, ...
research
07/19/2017

Generic Black-Box End-to-End Attack Against State of the Art API Call Based Malware Classifiers

In this paper, we present a black-box attack against API call based mach...
research
08/08/2023

DroidDissector: A Static and Dynamic Analysis Tool for Android Malware Detection

DroidDissector is an extraction tool for both static and dynamic feature...
research
07/18/2023

CBSeq: A Channel-level Behavior Sequence For Encrypted Malware Traffic Detection

Machine learning and neural networks have become increasingly popular so...
research
11/18/2021

Enhancing the Insertion of NOP Instructions to Obfuscate Malware via Deep Reinforcement Learning

Current state-of-the-art research for tackling the problem of malware de...
research
04/07/2020

Increasing the Inference and Learning Speed of Tsetlin Machines with Clause Indexing

The Tsetlin Machine (TM) is a machine learning algorithm founded on the ...

Please sign up or login with your details

Forgot password? Click here to reset