PhishGAN: Data Augmentation and Identification of Homoglpyh Attacks

06/24/2020
by   Joon Sern Lee, et al.
0

Homoglyph attacks are a common technique used by hackers to conduct phishing. Domain names or links that are visually similar to actual ones are created via punycode to obfuscate the attack, making the victim more susceptible to phishing. For example, victims may mistake "|inkedin.com" for "linkedin.com" and in the process, divulge personal details to the fake website. Current State of The Art (SOTA) typically make use of string comparison algorithms (e.g. Levenshtein Distance), which are computationally heavy. One reason for this is the lack of publicly available datasets thus hindering the training of more advanced Machine Learning (ML) models. Furthermore, no one font is able to render all types of punycode correctly, posing a significant challenge to the creation of a dataset that is unbiased toward any particular font. This coupled with the vast number of internet domains pose a challenge in creating a dataset that can capture all possible variations. Here, we show how a conditional Generative Adversarial Network (GAN), PhishGAN, can be used to generate images of hieroglyphs, conditioned on non-homoglpyh input text images. Practical changes to current SOTA were required to facilitate the generation of more varied homoglyph text-based images. We also demonstrate a workflow of how PhishGAN together with a Homoglyph Identifier (HI) model can be used to identify the domain the homoglyph was trying to imitate. Furthermore, we demonstrate how PhishGAN's ability to generate datasets on the fly facilitate the quick adaptation of cybersecurity systems to detect new threats as they emerge.

READ FULL TEXT

page 2

page 6

research
05/24/2018

Detecting Homoglyph Attacks with a Siamese Neural Network

A homoglyph (name spoofing) attack is a common technique used by adversa...
research
12/22/2022

GAN-based Domain Inference Attack

Model-based attacks can infer training data information from deep neural...
research
09/24/2018

Learning to Detect Fake Face Images in the Wild

Although Generative Adversarial Network (GAN) can be used to generate th...
research
11/14/2019

DomainGAN: Generating Adversarial Examples to Attack Domain Generation Algorithm Classifiers

Domain Generation Algorithms (DGAs) are frequently used to generate larg...
research
10/06/2016

DeepDGA: Adversarially-Tuned Domain Generation and Detection

Many malware families utilize domain generation algorithms (DGAs) to est...
research
01/12/2023

Open SESAME: Fighting Botnets with Seed Reconstructions of Domain Generation Algorithms

An important aspect of many botnets is their capability to generate pseu...
research
09/02/2022

TypoSwype: An Imaging Approach to Detect Typo-Squatting

Typo-squatting domains are a common cyber-attack technique. It involves ...

Please sign up or login with your details

Forgot password? Click here to reset