Log In Sign Up

Virus-MNIST: Machine Learning Baseline Calculations for Image Classification

by   Erik Larsen, et al.

The Virus-MNIST data set is a collection of thumbnail images that is similar in style to the ubiquitous MNIST hand-written digits. These, however, are cast by reshaping possible malware code into an image array. Naturally, it is poised to take on a role in benchmarking progress of virus classifier model training. Ten types are present: nine classified as malware and one benign. Cursory examination reveals unequal class populations and other key aspects that must be considered when selecting classification and pre-processing methods. Exploratory analyses show possible identifiable characteristics from aggregate metrics (e.g., the pixel median values), and ways to reduce the number of features by identifying strong correlations. A model comparison shows that Light Gradient Boosting Machine, Gradient Boosting Classifier, and Random Forest algorithms produced the highest accuracy scores, thus showing promise for deeper scrutiny.


page 2

page 3

page 4

page 5

page 6

page 9

page 11


Overhead-MNIST: Machine Learning Baselines for Image Classification

Twenty-three machine learning algorithms were trained then scored to est...

Intrusion Detection: Machine Learning Baseline Calculations for Image Classification

Cyber security can be enhanced through application of machine learning b...

New Datasets for Dynamic Malware Classification

Nowadays, malware and malware incidents are increasing daily, even with ...

Classification of malware based on file content and characteristics

In general, the industry of malware has come to be a market which brings...

Virus-MNIST: A Benchmark Malware Dataset

The short note presents an image classification dataset consisting of 10...

Tamil Vowel Recognition With Augmented MNIST-like Data Set

We report generation of a MNIST [4] compatible data set [1] for Tamil vo...

Machine Learning Classification of Kuiper Belt Populations

In the outer solar system, the Kuiper Belt contains dynamical sub-popula...