Activation Analysis of a Byte-Based Deep Neural Network for Malware Classification

03/12/2019
by   Scott E. Coull, et al.
0

Feature engineering is one of the most costly aspects of developing effective machine learning models, and that cost is even greater in specialized problem domains, like malware classification, where expert skills are necessary to identify useful features. Recent work, however, has shown that deep learning models can be used to automatically learn feature representations directly from the raw, unstructured bytes of the binaries themselves. In this paper, we explore what these models are learning about malware. To do so, we examine the learned features at multiple levels of resolution, from individual byte embeddings to end-to-end analysis of the model. At each step, we connect these byte-oriented activations to their original semantics through parsing and disassembly of the binary to arrive at human-understandable features. Through our results, we identify several interesting features learned by the model and their connection to manually-derived features typically used by traditional machine learning models. Additionally, we explore the impact of training data volume and regularization on the quality of the learned features and the efficacy of the classifiers, revealing the somewhat paradoxical insight that better generalization does not necessarily result in better performance for byte-based malware classifiers.

READ FULL TEXT
research
03/03/2020

DeepMAL – Deep Learning Models for Malware Traffic Detection and Classification

Robust network security systems are essential to prevent and mitigate th...
research
03/24/2021

CNN vs ELM for Image-Based Malware Classification

Research in the field of malware classification often relies on machine ...
research
10/24/2019

Neurlux: Dynamic Malware Analysis Without Feature Engineering

Malware detection plays a vital role in computer security. Modern machin...
research
06/12/2022

Fusing Feature Engineering and Deep Learning: A Case Study for Malware Classification

Machine learning has become an appealing signature-less approach to dete...
research
06/14/2016

Adversarial Perturbations Against Deep Neural Networks for Malware Classification

Deep neural networks, like many other machine learning models, have rece...
research
10/20/2022

On Feature Learning in the Presence of Spurious Correlations

Deep classifiers are known to rely on spurious features x2013 patterns w...
research
06/23/2021

First Step Towards EXPLAINable DGA Multiclass Classification

Numerous malware families rely on domain generation algorithms (DGAs) to...

Please sign up or login with your details

Forgot password? Click here to reset