MalPaCA: Malware Packet Sequence Clustering and Analysis
Malware family characterization is a challenging problem because ground-truth labels are not known. Anti-virus solutions provide labels for malware samples based on their static analysis. However, these labels are known to be inconsistent, causing the evaluation of analysis methods to depend on unreliable ground truth labels. These analysis methods are often black-boxes that make it impossible to verify the assigned family labels. To support malware analysts, we propose a whitebox method named MalPaCA to cluster malware's attacking capabilities reflected in their network traffic. We use sequential features to model temporal behavior. We also propose an intuitive, visualization-based cluster evaluation method to solve interpretability issues. The results show that clustering malware's attacking capabilities provides a more intimate profile of a family's behavior. The identified clusters capture various attacking capabilities, such as port scans and reuse of C&C servers. We discover a number of discrepancies between behavioral clusters and traditional malware family designations. In these cases, behavior within a family group was so varied that many supposedly related malwares had more in common with malware from other families than within their family designation. We also show that sequential features are better suited for modeling temporal behavior than statistical aggregates.
READ FULL TEXT