Machine Learning With Feature Selection Using Principal Component Analysis for Malware Detection: A Case Study

02/10/2019
by   Jason Zhang, et al.
0

Cyber security threats have been growing significantly in both volume and sophistication over the past decade. This poses great challenges to malware detection without considerable automation. In this paper, we have proposed a novel approach by extending our recently suggested artificial neural network (ANN) based model with feature selection using the principal component analysis (PCA) technique for malware detection. The effectiveness of the approach has been successfully demonstrated with the application in PDF malware detection. A varying number of principal components is examined in the comparative study. Our evaluation shows that the model with PCA can significantly reduce feature redundancy and learning time with minimum impact on data information loss, as confirmed by both training and testing results based on around 105,000 real-world PDF documents. Of the evaluated models using PCA, the model with 32 principal feature components exhibits very similar training accuracy to the model using the 48 original features, resulting in around 33 reduction and 22 effectiveness and show that the model is able to achieve 93.17 rate (TPR) while maintaining the same low false positive rate (FPR) of 0.08 the case when no feature selection is applied, which significantly outperforms all evaluated seven well known commercial antivirus (AV) scanners of which the best scanner only has a TPR of 84.53

READ FULL TEXT

page 1

page 5

research
08/29/2023

Assessing Cyclostationary Malware Detection via Feature Selection and Classification

Cyclostationarity involves periodic statistical variations in signals an...
research
08/21/2018

MLPdf: An Effective Machine Learning Based Approach for PDF Malware Detection

Due to the popularity of portable document format (PDF) and increasing n...
research
11/25/2020

Feature Selection based on Principal Component Analysis for Underwater Source Localization by Deep Learning

In this paper, we propose an interpretable feature selection method base...
research
07/13/2017

Automation of Feature Engineering for IoT Analytics

This paper presents an approach for automation of interpretable feature ...
research
01/15/2023

A Review on the effectiveness of Dimensional Reduction with Computational Forensics: An Application on Malware Analysis

The Android operating system is pervasively adopted as the operating sys...
research
08/09/2023

Gaussian Image Anomaly Detection with Greedy Eigencomponent Selection

Anomaly detection (AD) in images, identifying significant deviations fro...
research
04/08/2017

Seismic facies recognition based on prestack data using deep convolutional autoencoder

Prestack seismic data carries much useful information that can help us f...

Please sign up or login with your details

Forgot password? Click here to reset