Virus-MNIST: Machine Learning Baseline Calculations for Image Classification

11/03/2021
by   Erik Larsen, et al.
13

The Virus-MNIST data set is a collection of thumbnail images that is similar in style to the ubiquitous MNIST hand-written digits. These, however, are cast by reshaping possible malware code into an image array. Naturally, it is poised to take on a role in benchmarking progress of virus classifier model training. Ten types are present: nine classified as malware and one benign. Cursory examination reveals unequal class populations and other key aspects that must be considered when selecting classification and pre-processing methods. Exploratory analyses show possible identifiable characteristics from aggregate metrics (e.g., the pixel median values), and ways to reduce the number of features by identifying strong correlations. A model comparison shows that Light Gradient Boosting Machine, Gradient Boosting Classifier, and Random Forest algorithms produced the highest accuracy scores, thus showing promise for deeper scrutiny.

READ FULL TEXT

page 2

page 3

page 4

page 5

page 6

page 9

page 11

research
07/01/2021

Overhead-MNIST: Machine Learning Baselines for Image Classification

Twenty-three machine learning algorithms were trained then scored to est...
research
11/03/2021

Intrusion Detection: Machine Learning Baseline Calculations for Image Classification

Cyber security can be enhanced through application of machine learning b...
research
11/30/2021

New Datasets for Dynamic Malware Classification

Nowadays, malware and malware incidents are increasing daily, even with ...
research
09/26/2018

Classification of malware based on file content and characteristics

In general, the industry of malware has come to be a market which brings...
research
02/28/2021

Virus-MNIST: A Benchmark Malware Dataset

The short note presents an image classification dataset consisting of 10...
research
06/09/2020

Tamil Vowel Recognition With Augmented MNIST-like Data Set

We report generation of a MNIST [4] compatible data set [1] for Tamil vo...
research
07/07/2020

Machine Learning Classification of Kuiper Belt Populations

In the outer solar system, the Kuiper Belt contains dynamical sub-popula...

Please sign up or login with your details

Forgot password? Click here to reset