MalPaCA: Malware Packet Sequence Clustering and Analysis

04/02/2019
by   Azqa Nadeem, et al.
0

Malware family characterization is a challenging problem because ground-truth labels are not known. Anti-virus solutions provide labels for malware samples based on their static analysis. However, these labels are known to be inconsistent, causing the evaluation of analysis methods to depend on unreliable ground truth labels. These analysis methods are often black-boxes that make it impossible to verify the assigned family labels. To support malware analysts, we propose a whitebox method named MalPaCA to cluster malware's attacking capabilities reflected in their network traffic. We use sequential features to model temporal behavior. We also propose an intuitive, visualization-based cluster evaluation method to solve interpretability issues. The results show that clustering malware's attacking capabilities provides a more intimate profile of a family's behavior. The identified clusters capture various attacking capabilities, such as port scans and reuse of C&C servers. We discover a number of discrepancies between behavioral clusters and traditional malware family designations. In these cases, behavior within a family group was so varied that many supposedly related malwares had more in common with malware from other families than within their family designation. We also show that sequential features are better suited for modeling temporal behavior than statistical aggregates.

READ FULL TEXT

page 8

page 12

page 16

research
03/07/2021

Cluster Analysis of Malware Family Relationships

In this paper, we use K-means clustering to analyze various relationship...
research
09/23/2021

A Framework for Cluster and Classifier Evaluation in the Absence of Reference Labels

In some problem spaces, the high cost of obtaining ground truth labels n...
research
08/30/2022

AVMiner: Expansible and Semantic-Preserving Anti-Virus Labels Mining Method

With the increase in the variety and quantity of malware, there is an ur...
research
11/18/2022

Clustering based opcode graph generation for malware variant detection

Malwares are the key means leveraged by threat actors in the cyber space...
research
07/31/2020

Identifying meaningful clusters in malware data

Finding meaningful clusters in drive-by-download malware data is a parti...
research
10/22/2020

Malware Traffic Classification: Evaluation of Algorithms and an Automated Ground-truth Generation Pipeline

Identifying threats in a network traffic flow which is encrypted is uniq...
research
11/29/2021

MOTIF: A Large Malware Reference Dataset with Ground Truth Family Labels

Malware family classification is a significant issue with public safety ...

Please sign up or login with your details

Forgot password? Click here to reset