Infer XPath

11/05/2020
by   Michał J. Gajda, et al.
0

We propose reformulation of discovery of data structure within a web page as relations between sets of document nodes. We start by reformulating web page analysis as finding expressions in extension of XPath. Then we propose to automatically discover these XPath expressions with InferXPath meta-language. Our goal is to automate laborious process of conversion of manually created web pages that serve as software documentations, wikis, and reference documents, and speed up their conversion into tabular data that can be directly fed into data pipeline.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/08/2017

Difficulties of Timestamping Archived Web Pages

We show that state-of-the-art services for creating trusted timestamps i...
research
08/11/2015

Can JSP Code be Generated Using XML Tags?

Over the years, a variety of web services have started using server-side...
research
03/21/2022

Web Page Content Extraction Based on Multi-feature Fusion

With the rapid development of Internet technology, people have more and ...
research
06/21/2023

Comparative analysis of various web crawler algorithms

This presentation focuses on the importance of web crawling and page ran...
research
06/01/2022

Delivering Document Conversion as a Cloud Service with High Throughput and Responsiveness

Document understanding is a key business process in the data-driven econ...
research
02/18/2021

Robust PDF Document Conversion Using Recurrent Neural Networks

The number of published PDF documents has increased exponentially in rec...
research
12/01/2021

Automatic travel pattern extraction from visa page stamps using CNN models

We propose an automated document analysis system that processes scanned ...

Please sign up or login with your details

Forgot password? Click here to reset