FedEmail: Performance Measurement of Privacy-friendly Phishing Detection Enabled by Federated Learning

07/27/2020
by   Chandra Thapa, et al.
0

Artificial intelligence (AI) has been applied in phishing email detection. Typically, it requires rich email data from a collection of sources, and the data usually contains private information that needs to be preserved. So far, AI techniques are solely focusing on centralized data training that eventually accesses sensitive raw email data from the collected data repository. Thus, a privacy-friendly AI technique such as federated learning (FL) is a desideratum. FL enables learning over distributed email datasets to protect their privacy without the requirement of accessing them during the learning in a distributed computing framework. This work, to the best of our knowledge, is the first to investigate the applicability of training email anti-phishing model via FL. Building upon the Recurrent Convolutional Neural Network for phishing email detection, we comprehensively measure and evaluate the FL-entangled learning performance under various settings, including balanced and imbalanced data distribution among clients, scalability, communication overhead, and transfer learning. Our results positively corroborate comparable performance statistics of FL in phishing email detection to centralized learning. As a trade-off to privacy and distributed learning, FL has a communication overhead of 0.179 GB per global epoch per its clients. Our measurement-based results find that FL is suitable for practical scenarios, where data size variation, including the ratio of phishing to legitimate email samples, among the clients, are present. In all these scenarios, FL shows a similar performance of testing accuracy of around 98 with time in FL via transfer learning to improve the client-level performance. The transfer learning-enabled training results in the improvement of the testing accuracy by up to 2.6

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/26/2022

Mixed Federated Learning: Joint Decentralized and Centralized Learning

Federated learning (FL) enables learning from decentralized privacy-sens...
research
01/05/2022

Sample Selection with Deadline Control for Efficient Federated Learning on Heterogeneous Clients

Federated Learning (FL) trains a machine learning model on distributed c...
research
03/30/2020

End-to-End Evaluation of Federated Learning and Split Learning for Internet of Things

This work is the first attempt to evaluate and compare felderated learni...
research
07/12/2020

VAFL: a Method of Vertical Asynchronous Federated Learning

Horizontal Federated learning (FL) handles multi-client data that share ...
research
05/24/2020

Reliability and Performance Assessment of Federated Learning on Clinical Benchmark Data

As deep learning have been applied in a clinical context, privacy concer...
research
05/18/2021

Federated Learning With Highly Imbalanced Audio Data

Federated learning (FL) is a privacy-preserving machine learning method ...
research
11/30/2022

On the Design of Communication-Efficient Federated Learning for Health Monitoring

With the booming deployment of Internet of Things, health monitoring app...

Please sign up or login with your details

Forgot password? Click here to reset