Analysis of Label-Flip Poisoning Attack on Machine Learning Based Malware Detector

by   Kshitiz Aryal, et al.

With the increase in machine learning (ML) applications in different domains, incentives for deceiving these models have reached more than ever. As data is the core backbone of ML algorithms, attackers shifted their interest toward polluting the training data. Data credibility is at even higher risk with the rise of state-of-art research topics like open design principles, federated learning, and crowd-sourcing. Since the machine learning model depends on different stakeholders for obtaining data, there are no reliable automated mechanisms to verify the veracity of data from each source. Malware detection is arduous due to its malicious nature with the addition of metamorphic and polymorphic ability in the evolving samples. ML has proven to solve the zero-day malware detection problem, which is unresolved by traditional signature-based approaches. The poisoning of malware training data can allow the malware files to go undetected by the ML-based malware detectors, helping the attackers to fulfill their malicious goals. A feasibility analysis of the data poisoning threat in the malware detection domain is still lacking. Our work will focus on two major sections: training ML-based malware detectors and poisoning the training data using the label-poisoning approach. We will analyze the robustness of different machine learning models against data poisoning with varying volumes of poisoning data.


page 1

page 7

page 8


Beyond the Hype: A Real-World Evaluation of the Impact and Cost of Machine Learning–Based Malware Detection

There is a lack of scientific testing of commercially available malware ...

Towards Improving the Trustworthiness of Hardware based Malware Detector using Online Uncertainty Estimation

Hardware-based Malware Detectors (HMDs) using Machine Learning (ML) mode...

HAPSSA: Holistic Approach to PDF Malware Detection Using Signal and Statistical Analysis

Malicious PDF documents present a serious threat to various security org...

Marvolo: Programmatic Data Augmentation for Practical ML-Driven Malware Detection

Data augmentation has been rare in the cyber security domain due to tech...

ML-based IoT Malware Detection Under Adversarial Settings: A Systematic Evaluation

The rapid growth of the Internet of Things (IoT) devices is paralleled b...

Towards interpreting ML-based automated malware detection models: a survey

Malware is being increasingly threatening and malware detectors based on...

Symbolic analysis meets federated learning to enhance malware identifier

Over past years, the manually methods to create detection rules were no ...

Please sign up or login with your details

Forgot password? Click here to reset