On building machine learning pipelines for Android malware detection: a procedural survey of practices, challenges and opportunities

by   Masoud Mehrabi Koushki, et al.

As the smartphone market leader, Android has been a prominent target for malware attacks. The number of malicious applications (apps) identified for it has increased continually over the past decade, creating an immense challenge for all parties involved. For market holders and researchers, in particular, the large number of samples has made manual malware detection unfeasible, leading to an influx of research that investigate Machine Learning (ML) approaches to automate this process. However, while some of the proposed approaches achieve high performance, rapidly evolving Android malware has made them unable to maintain their accuracy over time. This has created a need in the community to conduct further research, and build more flexible ML pipelines. Doing so, however, is currently hindered by a lack of systematic overview of the existing literature, to learn from and improve upon the existing solutions. Existing survey papers often focus only on parts of the ML process (e.g., data collection or model deployment), while omitting other important stages, such as model evaluation and explanation. In this paper, we address this problem with a review of 42 highly-cited papers, spanning a decade of research (from 2011 to 2021). We introduce a novel procedural taxonomy of the published literature, covering how they have used ML algorithms, what features they have engineered, which dimensionality reduction techniques they have employed, what datasets they have employed for training, and what their evaluation and explanation strategies are. Drawing from this taxonomy, we also identify gaps in knowledge and provide ideas for improvement and future work.


Towards a Fair Comparison and Realistic Design and Evaluation Framework of Android Malware Detectors

As in other cybersecurity areas, machine learning (ML) techniques have e...

A Survey on the Detection of Android Malicious Apps

Android-based smart devices are exponentially growing, and due to the ub...

Fast Furious: Modelling Malware Detection as Evolving Data Streams

Malware is a major threat to computer systems and imposes many challenge...

Machine Learning Aided Static Malware Analysis: A Survey and Tutorial

Malware analysis and detection techniques have been evolving during the ...

Explainable AI for Android Malware Detection: Towards Understanding Why the Models Perform So Well?

Machine learning (ML)-based Android malware detection has been one of th...

Efficient Query-Based Attack against ML-Based Android Malware Detection under Zero Knowledge Setting

The widespread adoption of the Android operating system has made malicio...

A Survey of Machine Learning Methods and Challenges for Windows Malware Classification

Malware classification is a difficult problem, to which machine learning...

Please sign up or login with your details

Forgot password? Click here to reset