Dataset Bias in Android Malware Detection

05/31/2022
by   Yan Lin, et al.
0

Researchers have proposed kinds of malware detection methods to solve the explosive mobile security threats. We argue that the experiment results are inflated due to the research bias introduced by the variability of malware dataset. We explore the impact of bias in Android malware detection in three aspects, the method used to flag the ground truth, the distribution of malware families in the dataset, and the methods to use the dataset. We implement a set of experiments of different VT thresholds and find that the methods used to flag the malware data affect the malware detection performance directly. We further compare the impact of malware family types and composition on malware detection in detail. The superiority of each approach is different under various combinations of malware families. Through our extensive experiments, we showed that the methods to use the dataset can have a misleading impact on evaluation, and the performance difference can be up to over 40 these research biases observed in this paper should be carefully controlled/eliminated to enforce a fair comparison of malware detection techniques. Providing reasonable and explainable results is better than only reporting a high detection accuracy with vague dataset and experimental settings.

READ FULL TEXT

page 7

page 8

page 9

research
07/05/2021

Android Malware Category and Family Detection and Identification using Machine Learning

Android malware is one of the most dangerous threats on the internet, an...
research
03/25/2019

Don't Pick the Cherry: An Evaluation Methodology for Android Malware Detection Methods

In evaluating detection methods, the malware research community relies o...
research
07/20/2018

TESSERACT: Eliminating Experimental Bias in Malware Classification across Space and Time

Academic research on machine learning-based malware classification appea...
research
06/13/2022

On the impact of dataset size and class imbalance in evaluating machine-learning-based windows malware detection techniques

The purpose of this project was to collect and analyse data about the co...
research
09/02/2022

Explainable AI for Android Malware Detection: Towards Understanding Why the Models Perform So Well?

Machine learning (ML)-based Android malware detection has been one of th...
research
02/22/2018

Microsoft Malware Classification Challenge

The Microsoft Malware Classification Challenge was announced in 2015 alo...
research
06/27/2019

A New Malware Detection System Using a High Performance-ELM method

A vital element of a cyberspace infrastructure is cybersecurity. Many pr...

Please sign up or login with your details

Forgot password? Click here to reset