Intelligent Self-Repairable Web Wrappers

06/20/2011
by   Emilio Ferrara, et al.
0

The amount of information available on the Web grows at an incredible high rate. Systems and procedures devised to extract these data from Web sources already exist, and different approaches and techniques have been investigated during the last years. On the one hand, reliable solutions should provide robust algorithms of Web data mining which could automatically face possible malfunctioning or failures. On the other, in literature there is a lack of solutions about the maintenance of these systems. Procedures that extract Web data may be strictly interconnected with the structure of the data source itself; thus, malfunctioning or acquisition of corrupted data could be caused, for example, by structural modifications of data sources brought by their owners. Nowadays, verification of data integrity and maintenance are mostly manually managed, in order to ensure that these systems work correctly and reliably. In this paper we propose a novel approach to create procedures able to extract data from Web sources -- the so called Web wrappers -- which can face possible malfunctioning caused by modifications of the structure of the data source, and can automatically repair themselves.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/24/2011

Wrapper Maintenance: A Machine Learning Approach

The proliferation of online information sources has led to an increased ...
research
03/07/2011

Design of Automatically Adaptable Web Wrappers

Nowadays, the huge amount of information distributed through the Web mot...
research
06/27/2012

Canonical Trends: Detecting Trend Setters in Web Data

Much information available on the web is copied, reused or rephrased. Th...
research
10/22/2020

Transform Data Complexity into Profitability through Data Mining Services

Data Mining experts are able to efficiently search and extract data from...
research
09/04/2023

This Is a Local Domain: On Amassing Country-Code Top-Level Domains from Public Data

Domain lists are a key ingredient for representative censuses of the Web...
research
09/02/2019

Learning Real Estate Automated Valuation Models from Heterogeneous Data Sources

Real estate appraisal is a complex and important task, that can be made ...

Please sign up or login with your details

Forgot password? Click here to reset