Data augmentation on graphs for table type classification

08/23/2022
by   Davide del Bimbo, et al.
5

Tables are widely used in documents because of their compact and structured representation of information. In particular, in scientific papers, tables can sum up novel discoveries and summarize experimental results, making the research comparable and easily understandable by scholars. Since the layout of tables is highly variable, it would be useful to interpret their content and classify them into categories. This could be helpful to directly extract information from scientific papers, for instance comparing performance of some models given their paper result tables. In this work, we address the classification of tables using a Graph Neural Network, exploiting the table structure for the message passing algorithm in use. We evaluate our model on a subset of the Tab2Know dataset. Since it contains few examples manually annotated, we propose data augmentation techniques directly on the table graph structures. We achieve promising preliminary results, proposing a data augmentation method suitable for graph-based table representation.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/23/2022

Graph Neural Networks and Representation Embedding for Table Extraction in PDF Documents

Tables are widely used in several types of documents since they can brin...
research
02/01/2021

Metric-Type Identification for Multi-Level Header Numerical Tables in Scientific Papers

Numerical tables are widely used to present experimental results in scie...
research
10/07/2022

Calibration: A Simple Trick for Wide-table Delta Analytics

Data analytics over normalized databases typically requires computing an...
research
05/17/2018

Counterexample-Guided Data Augmentation

We present a novel framework for augmenting data sets for machine learni...
research
02/05/2021

Analysing the use of graphs to represent the results of Systematic Reviews in Software Engineering

The presentation of results from Systematic Literature Reviews (SLRs) is...
research
06/23/2021

ScanBank: A Benchmark Dataset for Figure Extraction from Scanned Electronic Theses and Dissertations

We focus on electronic theses and dissertations (ETDs), aiming to improv...
research
02/23/2023

Embeddings for Tabular Data: A Survey

Tabular data comprising rows (samples) with the same set of columns (att...

Please sign up or login with your details

Forgot password? Click here to reset