DNS Typo-squatting Domain Detection: A Data Analytics Machine Learning Based Approach

12/25/2020
by   Abdallah Moubayed, et al.
0

Domain Name System (DNS) is a crucial component of current IP-based networks as it is the standard mechanism for name to IP resolution. However, due to its lack of data integrity and origin authentication processes, it is vulnerable to a variety of attacks. One such attack is Typosquatting. Detecting this attack is particularly important as it can be a threat to corporate secrets and can be used to steal information or commit fraud. In this paper, a machine learning-based approach is proposed to tackle the typosquatting vulnerability. To that end, exploratory data analytics is first used to better understand the trends observed in eight domain name-based extracted features. Furthermore, a majority voting-based ensemble learning classifier built using five classification algorithms is proposed that can detect suspicious domains with high accuracy. Moreover, the observed trends are validated by studying the same features in an unlabeled dataset using K-means clustering algorithm and through applying the developed ensemble learning classifier. Results show that legitimate domains have a smaller domain name length and fewer unique characters. Moreover, the developed ensemble learning classifier performs better in terms of accuracy, precision, and F-score. Furthermore, it is shown that similar trends are observed when clustering is used. However, the number of domains identified as potentially suspicious is high. Hence, the ensemble learning classifier is applied with results showing that the number of domains identified as potentially suspicious is reduced by almost a factor of five while still maintaining the same trends in terms of features' statistics.

READ FULL TEXT
research
06/08/2020

Ensemble-based Feature Selection and Classification Model for DNS Typo-squatting Detection

Domain Name System (DNS) plays in important role in the current IP-based...
research
06/14/2020

Fake Reviews Detection through Ensemble Learning

Customers represent their satisfactions of consuming products by sharing...
research
12/16/2020

Optimized Random Forest Model for Botnet Detection Based on DNS Queries

The Domain Name System (DNS) protocol plays a major role in today's Inte...
research
07/24/2020

Genome Sequence Classification for Animal Diagnostics with Graph Representations and Deep Neural Networks

Bovine Respiratory Disease Complex (BRDC) is a complex respiratory disea...
research
09/17/2020

Improving Homograph Attack Classification

A visual homograph attack is a way that the attacker deceives the web us...
research
11/05/2018

Active Deep Learning Attacks under Strict Rate Limitations for Online API Calls

Machine learning has been applied to a broad range of applications and s...
research
04/13/2018

Adversarial Clustering: A Grid Based Clustering Algorithm Against Active Adversaries

Nowadays more and more data are gathered for detecting and preventing cy...

Please sign up or login with your details

Forgot password? Click here to reset