On Impact of Semantically Similar Apps in Android Malware Datasets

12/05/2021
by   Roopak Surendran, et al.
0

Malware authors reuse the same program segments found in other applications for performing the similar kind of malicious activities such as information stealing, sending SMS and so on. Hence, there may exist several semantically similar malware samples in a family/dataset. Many researchers unaware about these semantically similar apps and use their features in their ML models for evaluation. Hence, the performance measures might be seriously affected by these similar kinds of apps. In this paper, we study the impact of semantically similar applications in the performance measures of ML based Android malware detectors. For this, we propose a novel opcode subsequence based malware clustering algorithm to identify the semantically similar malware and goodware apps. For studying the impact of semantically similar apps in the performance measures, we tested the performance of distinct ML models based on API call and permission features of malware and goodware application with/without semantically similar apps. In our experimentation with Drebin dataset, we found that, after removing the exact duplicate apps from the dataset (? = 0) the malware detection rate (TPR) of API call based ML models is dropped from 0.95 to 0.91 and permission based model is dropped from 0.94 to 0.90. In order to overcome this issue, we advise the research community to use our clustering algorithm to get rid of semantically similar apps before evaluating their malware detection mechanism.

READ FULL TEXT

page 1

page 9

research
05/22/2019

DaDiDroid: An Obfuscation Resilient Tool for Detecting Android Malware via Weighted Directed Call Graph Modelling

With the number of new mobile malware instances increasing by over 50% a...
research
05/25/2022

Towards a Fair Comparison and Realistic Design and Evaluation Framework of Android Malware Detectors

As in other cybersecurity areas, machine learning (ML) techniques have e...
research
06/07/2022

Marvolo: Programmatic Data Augmentation for Practical ML-Driven Malware Detection

Data augmentation has been rare in the cyber security domain due to tech...
research
05/24/2022

Fast Furious: Modelling Malware Detection as Evolving Data Streams

Malware is a major threat to computer systems and imposes many challenge...
research
07/01/2020

Towards Accurate Labeling of Android Apps for Reliable Malware Detection

In training their newly-developed malware detection methods, researchers...
research
07/01/2020

Maat: Automatically Analyzing VirusTotal for Accurate Labeling and Effective Malware Detection

The malware analysis and detection research community relies on the onli...
research
09/03/2022

Illegal But Not Malware: An Underground Economy App Detection System Based on Usage Scenario

This paper focuses on mobile apps serving the underground economy by pro...

Please sign up or login with your details

Forgot password? Click here to reset