Malware Classification with Word Embedding Features

03/03/2021
by   Aparna Sunil Kale, et al.
0

Malware classification is an important and challenging problem in information security. Modern malware classification techniques rely on machine learning models that can be trained on features such as opcode sequences, API calls, and byte n-grams, among many others. In this research, we consider opcode features. We implement hybrid machine learning techniques, where we engineer feature vectors by training hidden Markov models – a technique that we refer to as HMM2Vec – and Word2Vec embeddings on these opcode sequences. The resulting HMM2Vec and Word2Vec embedding vectors are then used as features for classification algorithms. Specifically, we consider support vector machine (SVM), k-nearest neighbor (k-NN), random forest (RF), and convolutional neural network (CNN) classifiers. We conduct substantial experiments over a variety of malware families. Our experiments extend well beyond any previous work in this field.

READ FULL TEXT
research
03/07/2021

A Comparison of Word2Vec, HMM2Vec, and PCA2Vec for Malware Classification

Word embeddings are often used in natural language processing as a means...
research
03/24/2021

CNN vs ELM for Image-Based Malware Classification

Research in the field of malware classification often relies on machine ...
research
12/25/2021

An Ensemble of Pre-trained Transformer Models For Imbalanced Multiclass Malware Classification

Classification of malware families is crucial for a comprehensive unders...
research
10/16/2022

Interpretable Machine Learning for Detection and Classification of Ransomware Families Based on API Calls

Ransomware has appeared as one of the major global threats in recent day...
research
03/03/2021

Malware Classification with GMM-HMM Models

Discrete hidden Markov models (HMM) are often applied to malware detecti...
research
08/09/2022

Online Malware Classification with System-Wide System Calls in Cloud IaaS

Accurately classifying malware in an environment allows the creation of ...
research
11/23/2022

Lempel-Ziv Networks

Sequence processing has long been a central area of machine learning res...

Please sign up or login with your details

Forgot password? Click here to reset