New Datasets for Dynamic Malware Classification

11/30/2021
by   Berkant Düzgün, et al.
8

Nowadays, malware and malware incidents are increasing daily, even with various anti-viruses systems and malware detection or classification methodologies. Many static, dynamic, and hybrid techniques have been presented to detect malware and classify them into malware families. Dynamic and hybrid malware classification methods have advantages over static malware classification methods by being highly efficient. Since it is difficult to mask malware behavior while executing than its underlying code in static malware classification, machine learning techniques have been the main focus of the security experts to detect malware and determine their families dynamically. The rapid increase of malware also brings the necessity of recent and updated datasets of malicious software. We introduce two new, updated datasets in this work: One with 9,795 samples obtained and compiled from VirusSamples and the one with 14,616 samples from VirusShare. This paper also analyzes multi-class malware classification performance of the balanced and imbalanced version of these two datasets by using Histogram-based gradient boosting, Random Forest, Support Vector Machine, and XGBoost models with API call-based dynamic malware classification. Results show that Support Vector Machine, achieves the highest score of 94 91 most common gradient boosting-based models, achieves the highest score of 90 and 80 the baseline results of VirusShare and VirusSample datasets by using the four most widely known machine learning techniques in dynamic malware classification literature. We believe that these two datasets and baseline results enable researchers in this field to test and validate their methods and approaches.

READ FULL TEXT
research
12/05/2021

Using Static and Dynamic Malware features to perform Malware Ascription

Malware ascription is a relatively unexplored area, and it is rather dif...
research
03/07/2019

Detection of Advanced Malware by Machine Learning Techniques

In today's digital world most of the anti-malware tools are signature ba...
research
04/12/2022

Malware Analysis with Symbolic Execution and Graph Kernel

Malware analysis techniques are divided into static and dynamic analysis...
research
02/15/2018

NtMalDetect: A Machine Learning Approach to Malware Detection Using Native API System Calls

As computing systems become increasingly advanced and as users increasin...
research
11/10/2017

Dynamic Analysis of Executables to Detect and Characterize Malware

It is needed to ensure the integrity of systems that process sensitive i...
research
11/03/2021

Virus-MNIST: Machine Learning Baseline Calculations for Image Classification

The Virus-MNIST data set is a collection of thumbnail images that is sim...
research
05/27/2022

Machine Learning-based Ransomware Detection Using Low-level Memory Access Patterns Obtained From Live-forensic Hypervisor

Since modern anti-virus software mainly depends on a signature-based sta...

Please sign up or login with your details

Forgot password? Click here to reset