Graph Neural Networks and Representation Embedding for Table Extraction in PDF Documents

08/23/2022
by   Andrea Gemelli, et al.
12

Tables are widely used in several types of documents since they can bring important information in a structured way. In scientific papers, tables can sum up novel discoveries and summarize experimental results, making the research comparable and easily understandable by scholars. Several methods perform table analysis working on document images, losing useful information during the conversion from the PDF files since OCR tools can be prone to recognition errors, in particular for text inside tables. The main contribution of this work is to tackle the problem of table extraction, exploiting Graph Neural Networks. Node features are enriched with suitably designed representation embeddings. These representations help to better distinguish not only tables from the other parts of the paper, but also table cells from table headers. We experimentally evaluated the proposed approach on a new dataset obtained by merging the information provided in the PubLayNet and PubTables-1M datasets.

READ FULL TEXT
research
08/23/2022

Data augmentation on graphs for table type classification

Tables are widely used in documents because of their compact and structu...
research
03/22/2019

Line-items and table understanding in structured documents

Table detection and extraction has been studied in the context of docume...
research
02/02/2023

CTE: A Dataset for Contextualized Table Extraction

Relevant information in documents is often summarized in tables, helping...
research
06/23/2021

ScanBank: A Benchmark Dataset for Figure Extraction from Scanned Electronic Theses and Dissertations

We focus on electronic theses and dissertations (ETDs), aiming to improv...
research
07/03/2022

DiSCoMaT: Distantly Supervised Composition Extraction from Tables in Materials Science Articles

A crucial component in the curation of KB for a scientific domain is inf...
research
09/06/2021

Text-to-Table: A New Way of Information Extraction

We study a new problem setting of information extraction (IE), referred ...
research
04/07/2021

The quantification of Simpsons paradox and other contributions to contingency table theory

The analysis of contingency tables is a powerful statistical tool used i...

Please sign up or login with your details

Forgot password? Click here to reset