AVMiner: Expansible and Semantic-Preserving Anti-Virus Labels Mining Method

08/30/2022
by   Ligeng Chen, et al.
0

With the increase in the variety and quantity of malware, there is an urgent need to speed up the diagnosis and the analysis of malware. Extracting the malware family-related tokens from AV (Anti-Virus) labels, provided by online anti-virus engines, paves the way for pre-diagnosing the malware. Automatically extract the vital information from AV labels will greatly enhance the detection ability of security enterprises and equip the research ability of security analysts. Recent works like AVCLASS and AVCLASS2 try to extract the attributes of malware from AV labels and establish the taxonomy based on expert knowledge. However, due to the uncertain trend of complicated malicious behaviors, the system needs the following abilities to face the challenge: preserving vital semantics, being expansible, and free from expert knowledge. In this work, we present AVMiner, an expansible malware tagging system that can mine the most vital tokens from AV labels. AVMiner adopts natural language processing techniques and clustering methods to generate a sequence of tokens without expert knowledge ranked by importance. AVMiner can self-update when new samples come. Finally, we evaluate AVMiner on over 8,000 samples from well-known datasets with manually labeled ground truth, which outperforms previous works.

READ FULL TEXT
research
04/02/2019

MalPaCA: Malware Packet Sequence Clustering and Analysis

Malware family characterization is a challenging problem because ground-...
research
01/21/2021

Malware Detection and Analysis: Challenges and Research Opportunities

Malwares are continuously growing in sophistication and numbers. Over th...
research
06/18/2020

AVClass2: Massive Malware Tag Extraction from AV Labels

Tags can be used by malware repositories and analysis services to enable...
research
01/30/2023

Behavioural Reports of Multi-Stage Malware

The extensive damage caused by malware requires anti-malware systems to ...
research
11/29/2021

MOTIF: A Large Malware Reference Dataset with Ground Truth Family Labels

Malware family classification is a significant issue with public safety ...
research
01/03/2019

A First Look at the Crypto-Mining Malware Ecosystem: A Decade of Unrestricted Wealth

Illicit crypto-mining leverages resources stolen from victims to mine cr...
research
06/07/2022

Marvolo: Programmatic Data Augmentation for Practical ML-Driven Malware Detection

Data augmentation has been rare in the cyber security domain due to tech...

Please sign up or login with your details

Forgot password? Click here to reset