Automatic Wrapper Adaptation by Tree Edit Distance Matching

03/07/2011
by   Emilio Ferrara, et al.
0

Information distributed through the Web keeps growing faster day by day, and for this reason, several techniques for extracting Web data have been suggested during last years. Often, extraction tasks are performed through so called wrappers, procedures extracting information from Web pages, e.g. implementing logic-based techniques. Many fields of application today require a strong degree of robustness of wrappers, in order not to compromise assets of information or reliability of data extracted. Unfortunately, wrappers may fail in the task of extracting data from a Web page, if its structure changes, sometimes even slightly, thus requiring the exploiting of new techniques to be automatically held so as to adapt the wrapper to the new structure of the page, in case of failure. In this work we present a novel approach of automatic wrapper adaptation based on the measurement of similarity of trees through improved tree edit distance matching techniques.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/07/2011

Design of Automatically Adaptable Web Wrappers

Nowadays, the huge amount of information distributed through the Web mot...
research
04/27/2020

SFTM: Fast Comparison of Web Documents using Similarity-based Flexible Tree Matching

Tree matching techniques have been investigated in many fields, includin...
research
02/01/2022

WebFormer: The Web-page Transformer for Structure Information Extraction

Structure information extraction refers to the task of extracting struct...
research
05/08/2014

A Vague Improved Markov Model Approach for Web Page Prediction

Today most of the information in all areas is available over the web. It...
research
05/24/2022

PLAtE: A Large-scale Dataset for List Page Web Extraction

Recently, neural models have been leveraged to significantly improve the...
research
08/30/2021

Web Application Testing: Using Tree Kernels to Detect Near-duplicate States in Automated Model Inference

In the context of End-to-End testing of web applications, automated expl...
research
06/24/2011

Wrapper Maintenance: A Machine Learning Approach

The proliferation of online information sources has led to an increased ...

Please sign up or login with your details

Forgot password? Click here to reset