Design of Automatically Adaptable Web Wrappers

03/07/2011
by   Emilio Ferrara, et al.
0

Nowadays, the huge amount of information distributed through the Web motivates studying techniques to be adopted in order to extract relevant data in an efficient and reliable way. Both academia and enterprises developed several approaches of Web data extraction, for example using techniques of artificial intelligence or machine learning. Some commonly adopted procedures, namely wrappers, ensure a high degree of precision of information extracted from Web pages, and, at the same time, have to prove robustness in order not to compromise quality and reliability of data themselves. In this paper we focus on some experimental aspects related to the robustness of the data extraction process and the possibility of automatically adapting wrappers. We discuss the implementation of algorithms for finding similarities between two different version of a Web page, in order to handle modifications, avoiding the failure of data extraction tasks and ensuring reliability of information extracted. Our purpose is to evaluate performances, advantages and draw-backs of our novel system of automatic wrapper adaptation.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/07/2011

Automatic Wrapper Adaptation by Tree Edit Distance Matching

Information distributed through the Web keeps growing faster day by day,...
research
06/20/2011

Intelligent Self-Repairable Web Wrappers

The amount of information available on the Web grows at an incredible hi...
research
09/25/2010

Web Page Categorization Using Artificial Neural Networks

Web page categorization is one of the challenging tasks in the world of ...
research
12/12/2012

Learning with Scope, with Application to Information Extraction and Classification

In probabilistic approaches to classification and information extraction...
research
05/24/2022

PLAtE: A Large-scale Dataset for List Page Web Extraction

Recently, neural models have been leveraged to significantly improve the...
research
06/28/2018

Introduction to OXPath

Contemporary web pages with increasingly sophisticated interfaces rival ...
research
01/24/2019

SAM: A Modular Framework for Self-Adapting Web Menus

This paper presents SAM, a modular and extensible JavaScript framework f...

Please sign up or login with your details

Forgot password? Click here to reset