Machine Unlearning: Learning, Polluting, and Unlearning for Spam Email

11/26/2021
by   Nishchal Parne, et al.
1

Machine unlearning for security is studied in this context. Several spam email detection methods exist, each of which employs a different algorithm to detect undesired spam emails. But these models are vulnerable to attacks. Many attackers exploit the model by polluting the data, which are trained to the model in various ways. So to act deftly in such situations model needs to readily unlearn the polluted data without the need for retraining. Retraining is impractical in most cases as there is already a massive amount of data trained to the model in the past, which needs to be trained again just for removing a small amount of polluted data, which is often significantly less than 1 spam detection models. In this research, unlearning module is integrated into spam detection models that are based on Naive Bayes, Decision trees, and Random Forests algorithms. To assess the benefits of unlearning over retraining, three spam detection models are polluted and exploited by taking attackers' positions and proving models' vulnerability. Reduction in accuracy and true positive rates are shown in each case showing the effect of pollution on models. Then unlearning modules are integrated into the models, and polluted data is unlearned; on testing the models after unlearning, restoration of performance is seen. Also, unlearning and retraining times are compared with different pollution data sizes on all models. On analyzing the findings, it can be concluded that unlearning is considerably superior to retraining. Results show that unlearning is fast, easy to implement, easy to use, and effective.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/01/2020

Towards a Universal Features Set for IoT Botnet Attacks Detection

The security pitfalls of IoT devices make it easy for the attackers to e...
research
09/05/2021

A Transformer-based Model to Detect Phishing URLs

Phishing attacks are among emerging security issues that recently draws ...
research
11/22/2021

Machine unlearning via GAN

Machine learning models, especially deep models, may unintentionally rem...
research
10/14/2020

BlockFLA: Accountable Federated Learning via Hybrid Blockchain Architecture

Federated Learning (FL) is a distributed, and decentralized machine lear...
research
02/05/2020

Fast inference of Boosted Decision Trees in FPGAs for particle physics

We describe the implementation of Boosted Decision Trees in the hls4ml l...
research
08/12/2022

Unifying Gradients to Improve Real-world Robustness for Deep Networks

The wide application of deep neural networks (DNNs) demands an increasin...
research
10/15/2020

An Algorithm for Learning Smaller Representations of Models With Scarce Data

We present a greedy algorithm for solving binary classification problems...

Please sign up or login with your details

Forgot password? Click here to reset