An Ensemble of Pre-trained Transformer Models For Imbalanced Multiclass Malware Classification

12/25/2021
by   Ferhat Demirkıran, et al.
17

Classification of malware families is crucial for a comprehensive understanding of how they can infect devices, computers, or systems. Thus, malware identification enables security researchers and incident responders to take precautions against malware and accelerate mitigation. API call sequences made by malware are widely utilized features by machine and deep learning models for malware classification as these sequences represent the behavior of malware. However, traditional machine and deep learning models remain incapable of capturing sequence relationships between API calls. On the other hand, the transformer-based models process sequences as a whole and learn relationships between API calls due to multi-head attention mechanisms and positional embeddings. Our experiments demonstrate that the transformer model with one transformer block layer surpassed the widely used base architecture, LSTM. Moreover, BERT or CANINE, pre-trained transformer models, outperformed in classifying highly imbalanced malware families according to evaluation metrics, F1-score, and AUC score. Furthermore, the proposed bagging-based random transformer forest (RTF), an ensemble of BERT or CANINE, has reached the state-of-the-art evaluation scores on three out of four datasets, particularly state-of-the-art F1-score of 0.6149 on one of the commonly used benchmark dataset.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/03/2021

Malware Classification with Word Embedding Features

Malware classification is an important and challenging problem in inform...
research
01/27/2022

Aspect-Based API Review Classification: How Far Can Pre-Trained Transformer Model Go?

APIs (Application Programming Interfaces) are reusable software librarie...
research
12/20/2019

Random CapsNet Forest Model for Imbalanced Malware Type Classification Task

Behavior of a malware varies with respect to malware types. Therefore,kn...
research
12/16/2019

Learning Malware Representation based on Execution Sequences

Malware analysis has been extensively investigated as the number and typ...
research
06/09/2023

Early Malware Detection and Next-Action Prediction

In this paper, we propose a framework for early-stage malware detection ...
research
08/09/2022

Online Malware Classification with System-Wide System Calls in Cloud IaaS

Accurately classifying malware in an environment allows the creation of ...
research
12/07/2021

raceBERT – A Transformer-based Model for Predicting Race and Ethnicity from Names

This paper presents raceBERT – a transformer-based model for predicting ...

Please sign up or login with your details

Forgot password? Click here to reset