Malware Detection by Eating a Whole EXE

by   Edward Raff, et al.

In this work we introduce malware detection from raw byte sequences as a fruitful research area to the larger machine learning community. Building a neural network for such a problem presents a number of interesting challenges that have not occurred in tasks such as image processing or NLP. In particular, we note that detection from raw bytes presents a sequence problem with over two million time steps and a problem where batch normalization appear to hinder the learning process. We present our initial work in building a solution to tackle this problem, which has linear complexity dependence on the sequence length, and allows for interpretable sub-regions of the binary to be identified. In doing so we will discuss the many challenges in building a neural network to process data at this scale, and the methods we used to work around them.


page 1

page 2

page 3

page 4


MDEA: Malware Detection with Evolutionary Adversarial Learning

Malware detection have used machine learning to detect malware in progra...

Classifying Sequences of Extreme Length with Constant Memory Applied to Malware Detection

Recent works within machine learning have been tackling inputs of ever-i...

Adversarial Malware Binaries: Evading Deep Learning for Malware Detection in Executables

Machine-learning methods have already been exploited as useful tools for...

R2-D2: ColoR-inspired Convolutional NeuRal Network (CNN)-based AndroiD Malware Detections

Machine Learning (ML) has found it particularly useful in malware detect...

DexRay: A Simple, yet Effective Deep Learning Approach to Android Malware Detection based on Image Representation of Bytecode

Computer vision has witnessed several advances in recent years, with unp...

I-MAD: A Novel Interpretable Malware Detector Using Hierarchical Transformer

Malware imposes tremendous threats to computer users nowadays. Since sig...

SeqNet: An Efficient Neural Network for Automatic Malware Detection

Malware continues to evolve rapidly, and more than 450,000 new samples a...