Sequence Feature Extraction for Malware Family Analysis via Graph Neural Network

08/10/2022
by   S. W. Hsiao, et al.
0

Malicious software (malware) causes much harm to our devices and life. We are eager to understand the malware behavior and the threat it made. Most of the record files of malware are variable length and text-based files with time stamps, such as event log data and dynamic analysis profiles. Using the time stamps, we can sort such data into sequence-based data for the following analysis. However, dealing with the text-based sequences with variable lengths is difficult. In addition, unlike natural language text data, most sequential data in information security have specific properties and structure, such as loop, repeated call, noise, etc. To deeply analyze the API call sequences with their structure, we use graphs to represent the sequences, which can further investigate the information and structure, such as the Markov model. Therefore, we design and implement an Attention Aware Graph Neural Network (AWGCN) to analyze the API call sequences. Through AWGCN, we can obtain the sequence embeddings to analyze the behavior of the malware. Moreover, the classification experiment result shows that AWGCN outperforms other classifiers in the call-like datasets, and the embedding can further improve the classic model's performance.

READ FULL TEXT

page 1

page 5

page 12

research
06/10/2019

Malware Detection with LSTM using Opcode Language

Nowadays, with the booming development of Internet and software industry...
research
12/30/2019

A New Burrows Wheeler Transform Markov Distance

Prior work inspired by compression algorithms has described how the Burr...
research
12/05/2022

Efficient Malware Analysis Using Metric Embeddings

In this paper, we explore the use of metric learning to embed Windows PE...
research
03/05/2021

NF-GNN: Network Flow Graph Neural Networks for Malware Detection and Classification

Malicious software (malware) poses an increasing threat to the security ...
research
03/28/2021

An In-memory Embedding of CPython for Offensive Use

We offer an embedding of CPython that runs entirely in memory without "t...
research
03/24/2018

Extended Abstract: Mimicry Resilient Program Behavior Modeling with LSTM based Branch Models

In the software design, protecting a computer system from a plethora of ...
research
06/09/2023

Early Malware Detection and Next-Action Prediction

In this paper, we propose a framework for early-stage malware detection ...

Please sign up or login with your details

Forgot password? Click here to reset