URLTran: Improving Phishing URL Detection Using Transformers

06/09/2021
by   Pranav Maneriker, et al.
0

Browsers often include security features to detect phishing web pages. In the past, some browsers evaluated an unknown URL for inclusion in a list of known phishing pages. However, as the number of URLs and known phishing pages continued to increase at a rapid pace, browsers started to include one or more machine learning classifiers as part of their security services that aim to better protect end users from harm. While additional information could be used, browsers typically evaluate every unknown URL using some classifier in order to quickly detect these phishing pages. Early phishing detection used standard machine learning classifiers, but recent research has instead proposed the use of deep learning models for the phishing URL detection task. Concurrently, text embedding research using transformers has led to state-of-the-art results in many natural language processing tasks. In this work, we perform a comprehensive analysis of transformer models on the phishing URL detection task. We consider standard masked language model and additional domain-specific pre-training tasks, and compare these models to fine-tuned BERT and RoBERTa models. Combining the insights from these experiments, we propose URLTran which uses transformers to significantly improve the performance of phishing URL detection over a wide range of very low false positive rates (FPRs) compared to other deep learning-based methods. For example, URLTran yields a true positive rate (TPR) of 86.80 0.01 some classical adversarial black-box phishing attacks such as those based on homoglyphs and compound word splits to improve the robustness of URLTran. We consider additional fine tuning with these adversarial samples and demonstrate that URLTran can maintain low FPRs under these scenarios.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/28/2023

Tutorials on Stance Detection using Pre-trained Language Models: Fine-tuning BERT and Prompting Large Language Models

This paper presents two self-contained tutorials on stance detection in ...
research
11/01/2021

Deep Learning Transformer Architecture for Named Entity Recognition on Low Resourced Languages: State of the art results

This paper reports on the evaluation of Deep Learning (DL) transformer a...
research
07/23/2021

A Differentiable Language Model Adversarial Attack on Text Classifiers

Robustness of huge Transformer-based models for natural language process...
research
12/29/2021

Fine-Tuning Transformers: Vocabulary Transfer

Transformers are responsible for the vast majority of recent advances in...
research
05/16/2023

Adapting Sentence Transformers for the Aviation Domain

Learning effective sentence representations is crucial for many Natural ...
research
11/19/2019

Towards non-toxic landscapes: Automatic toxic comment detection using DNN

The spectacular expansion of the Internet led to the development of a ne...
research
07/17/2019

Improving Outbreak Detection with Stacking of Statistical Surveillance Methods

Epidemiologists use a variety of statistical algorithms for the early de...

Please sign up or login with your details

Forgot password? Click here to reset