Detection of Change Frequency in Web Pages to Optimize Server-based Scheduling

03/06/2022
by   Dulani Meedeniya, et al.
0

The Internet at present has become vast and dynamic with the ever increasing number of web pages. These web pages change when more content is added to them. With the availability of change detection and notification systems, keeping track of the changes occurring in web pages has become more simple and straightforward. However, most of these change detection and notification systems work based on predefined crawling schedules with static time intervals. This can become inefficient if there are no relevant changes being made to the web pages, resulting in the wastage of both temporal and computational resources. If the web pages are not crawled frequently, some of the important changes may be missed and there may be delays in notifying the subscribed users. This paper proposes a methodology to detect the frequency of change in web pages to optimize server-side scheduling of change detection and notification systems. The proposed method is based on a dynamic detection process, where the crawling schedule will be adjusted accordingly in order to result in a more efficient server-based scheduler to detect changes in web pages.

READ FULL TEXT
research
03/06/2022

Change detection optimization in frequently changing web pages

Web pages at present have become dynamic and frequently changing, compar...
research
03/06/2022

Random Forest Classifier based Scheduler Optimization for Search Engine Web Crawlers

The backbone of every search engine is the set of web crawlers, which go...
research
03/06/2022

Adaptive technique for web page change detection using multi-threaded crawlers

World Wide Web is getting dense as many new web pages and resources are ...
research
04/16/2020

Toward Efficient Web Publishing with Provenance of Information Using Trusty URIs: Applying the proposed model with the Quran

This research presents a methodology for trusting the provenance of data...
research
04/13/2018

A Deep Learning Approach to Fast, Format-Agnostic Detection of Malicious Web Content

Malicious web content is a serious problem on the Internet today. In thi...
research
12/02/2018

Improved and Robust Controversy Detection in General Web Pages Using Semantic Approaches under Large Scale Conditions

Detecting controversy in general web pages is a daunting task, but incre...
research
05/19/2019

Regions In a Linked Dataset For Change Detection

Linked Datasets (LDs) are constantly evolving and the applications using...

Please sign up or login with your details

Forgot password? Click here to reset