Man versus Machine: AutoML and Human Experts' Role in Phishing Detection

08/27/2021
by   Rizka Purwanto, et al.
14

Machine learning (ML) has developed rapidly in the past few years and has successfully been utilized for a broad range of tasks, including phishing detection. However, building an effective ML-based detection system is not a trivial task, and requires data scientists with knowledge of the relevant domain. Automated Machine Learning (AutoML) frameworks have received a lot of attention in recent years, enabling non-ML experts in building a machine learning model. This brings to an intriguing question of whether AutoML can outperform the results achieved by human data scientists. Our paper compares the performances of six well-known, state-of-the-art AutoML frameworks on ten different phishing datasets to see whether AutoML-based models can outperform manually crafted machine learning models. Our results indicate that AutoML-based models are able to outperform manually developed machine learning models in complex classification tasks, specifically in datasets where the features are not quite discriminative, and datasets with overlapping classes or relatively high degrees of non-linearity. Challenges also remain in building a real-world phishing detection system using AutoML frameworks due to the current support only on supervised classification problems, leading to the need for labeled data, and the inability to update the AutoML-based models incrementally. This indicates that experts with knowledge in the domain of phishing and cybersecurity are still essential in the loop of the phishing detection pipeline.

READ FULL TEXT

page 5

page 12

page 16

page 17

research
09/03/2020

Can AutoML outperform humans? An evaluation on popular OpenML datasets using AutoML Benchmark

In the last few years, Automated Machine Learning (AutoML) has gained mu...
research
06/05/2019

Automated Machine Learning: State-of-The-Art and Open Challenges

With the continuous and vast increase in the amount of data in our digit...
research
01/29/2021

Facilitating Knowledge Sharing from Domain Experts to Data Scientists for Building NLP Models

Data scientists face a steep learning curve in understanding a new domai...
research
05/09/2022

On Generalisability of Machine Learning-based Network Intrusion Detection Systems

Many of the proposed machine learning (ML) based network intrusion detec...
research
08/13/2019

Exploiting Parallelism Opportunities with Deep Learning Frameworks

State-of-the-art machine learning frameworks support a wide variety of d...
research
12/10/2020

A Simplistic Machine Learning Approach to Contact Tracing

This report is based on the modified NIST challenge, Too Close For Too L...
research
11/29/2022

Synthetic data enable experiments in atomistic machine learning

Machine-learning models are increasingly used to predict properties of a...

Please sign up or login with your details

Forgot password? Click here to reset