A systematic framework to discover pattern for web spam classification

11/19/2017
by   Hamed Jelodar, et al.
0

Web spam is a big problem for search engine users in World Wide Web. They use deceptive techniques to achieve high rankings. Although many researchers have presented the different approach for classification and web spam detection still it is an open issue in computer science. Analyzing and evaluating these websites can be an effective step for discovering and categorizing the features of these websites. There are several methods and algorithms for detecting those websites, such as decision tree algorithm. In this paper, we present a systematic framework based on CHAID algorithm and a modified string matching algorithm (KMP) for extract features and analysis of these websites. We evaluated our model and other methods with a dataset of Alexa Top 500 Global Sites and Bing search engine results in 500 queries.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/02/2020

Web Crawler: Design And Implementation For Extracting Article-Like Contents

The World Wide Web is a large, wealthy, and accessible information syste...
research
02/10/2021

Dot-Science Top Level Domain: academic websites or dumpsites?

Dot-science was launched in 2015 as a new academic top-level domain (TLD...
research
01/19/2020

Intelligent Methods for Accurately Detecting Phishing Websites

With increasing technology developments, there is a massive number of we...
research
03/19/2019

BotGraph: Web Bot Detection Based on Sitemap

The web bots have been blamed for consuming large amount of Internet tra...
research
04/28/2015

CommentWatcher: An Open Source Web-based platform for analyzing discussions on web forums

We present CommentWatcher, an open source tool aimed at analyzing discus...
research
09/12/2023

Cookiescanner: An Automated Tool for Detecting and Evaluating GDPR Consent Notices on Websites

The enforcement of the GDPR led to the widespread adoption of consent no...
research
04/14/2013

Unveiling the link between logical fallacies and web persuasion

In the last decade Human-Computer Interaction (HCI) has started to focus...

Please sign up or login with your details

Forgot password? Click here to reset