Look Before You Leap: Detecting Phishing Web Pages by Exploiting Raw URL And HTML Characteristics
Cybercriminals resort to phishing as a simple and cost-effective medium to perpetrate cyber-attacks on today's Internet. Recent studies in phishing detection are increasingly adopting automated feature selection over traditional manually engineered features. This transition is due to the inability of existing traditional methods to extrapolate their learning to new data. To this end, in this paper, we propose WebPhish, a deep learning technique using automatic feature selection extracted from the raw URL and HTML of a web page. This approach is the first of its kind, which uses the concatenation of URL and HTML embedding feature vectors as input into a Convolutional Neural Network model to detect phishing attacks on web pages. Extensive experiments on a real-world dataset yielded an accuracy of 98 percent, outperforming other state-of-the-art techniques. Also, WebPhish is a client-side strategy that is completely language-independent and can conduct lightweight phishing detection regardless of the web page's textual language.
READ FULL TEXT