On Extracting Data from HTML Tables

03/20/2019
by   Juan C. Roldán, et al.
0

The Web provides many data in user-friendly tabular formats that are encoded using HTML. Information extractors are intended to extract those data as datasets that can feed business applications. There exist many proposals to implement them, which has motivated several previous surveys. Unfortunately, they are outdated and we do not think that it suffices to update them because they do not provide a good conceptual framework, they do not provide a taxonomy of web tables, they do not analyse the exact tasks involved, and they do not provide a good comparison framework. This article presents a review of the literature that does not have any of the previous problems, which we hope will be useful to both researchers and practitioners.

READ FULL TEXT

page 6

page 7

page 13

page 14

page 15

page 16

page 17

page 19

research
03/20/2019

On Extracting Data from Tables that are Encoded using HTML

Tables are a common means to display data in human-friendly formats. Man...
research
02/17/2018

TabVec: Table Vectors for Classification of Web Tables

There are hundreds of millions of tables in Web pages that contain usefu...
research
05/25/2017

Synthesizing Mapping Relationships Using Table Corpus

Mapping relationships, such as (country, country-code) or (company, stoc...
research
10/27/2021

An exact, unconditional, nuisance-agnostic test for contingency tables

Exact tests greatly improve the analysis of contingency tables when marg...
research
10/03/2022

Russian Web Tables: A Public Corpus of Web Tables for Russian Language Based on Wikipedia

Corpora that contain tabular data such as WebTables are a vital resource...
research
08/09/2022

Proposals for Resolving Consenting Issues with Signals and User-side Dialogues

Consent dialogues are a source of annoyance, malicious intent, dark patt...
research
03/31/2021

Simpson's Paradox: A Singularity of Statistical and Inductive Inference

The occurrence of Simpson's paradox (SP) in 2× 2 contingency tables has ...

Please sign up or login with your details

Forgot password? Click here to reset