Adaptive technique for web page change detection using multi-threaded crawlers

03/06/2022
by   Dulani Meedeniya, et al.
0

World Wide Web is getting dense as many new web pages and resources are created on a daily basis. Keeping track of the changes in the web content has become an immense challenge and is a research problem with a great significance. Even the search engines require to detect changes in the web to keep search indexes up to date. Numerous researches have been carried out on optimizing the change detection algorithms. This paper presents a methodology named Multi-Threaded Crawler for Change Detection of Web (MTCCDW), which is inspired from the producer-consumer problem. The suggested change detection process mainly analyses the performances and suggests a tread-based implementation process for the optimisation of the changed detection process. The experimental results show that the proposed methodology is capable of reducing the effective time to detect changes in a web page by 93.51%.

READ FULL TEXT
research
03/06/2022

Detection of Change Frequency in Web Pages to Optimize Server-based Scheduling

The Internet at present has become vast and dynamic with the ever increa...
research
10/26/2022

WebCrack: Dynamic Dictionary Adjustment for Web Weak Password Detection based on Blasting Response Event Discrimination

The feature diversity of different web systems in page elements, submiss...
research
04/30/2023

Making Changes in Webpages Discoverable: A Change-Text Search Interface for Web Archives

Webpages change over time, and web archives hold copies of historical ve...
research
08/28/2019

HTMLPhish: Enabling Accurate Phishing Web Page Detection by Applying Deep Learning Techniques on HTML Analysis

Recently, the development and implementation of phishing attacks require...
research
03/06/2022

Optimizing Change Detection in Distributed Digital Collections: An Architectural Perspective of Change Detection

Digital documents are likely to have problems associated with the persis...
research
04/05/2021

Managing Research the Wiki Way: A Systematic Approach to Documenting Research

As a master's student, knowing how to manage your personal research is n...
research
04/16/2020

Toward Efficient Web Publishing with Provenance of Information Using Trusty URIs: Applying the proposed model with the Quran

This research presents a methodology for trusting the provenance of data...

Please sign up or login with your details

Forgot password? Click here to reset