Log In Sign Up

Complicated Table Structure Recognition

by   Zewen Chi, et al.

The task of table structure recognition aims to recognize the internal structure of a table, which is a key step to make machines understand tables. Currently, there are lots of studies on this task for different file formats such as ASCII text and HTML. It also attracts lots of attention to recognize the table structures in PDF files. However, it is hard for the existing methods to accurately recognize the structure of complicated tables in PDF files. The complicated tables contain spanning cells which occupy at least two columns or rows. To address the issue, we propose a novel graph neural network for recognizing the table structure in PDF files, named GraphTSR. Specifically, it takes table cells as input, and then recognizes the table structures by predicting relations among cells. Moreover, to evaluate the task better, we construct a large-scale table structure recognition dataset from scientific papers, named SciTSR, which contains 15,000 tables from PDF files and their corresponding structure labels. Extensive experiments demonstrate that our proposed model is highly effective for complicated tables and outperforms state-of-the-art baselines over a benchmark dataset and our new constructed dataset.


TGRNet: A Table Graph Reconstruction Network for Table Structure Recognition

A table arranging data in rows and columns is a very effective data stru...

Parsing Table Structures in the Wild

This paper tackles the problem of table structure parsing (TSP) from ima...

Robust Table Detection and Structure Recognition from Heterogeneous Document Images

We introduce a new table detection and structure recognition approach na...

TSRFormer: Table Structure Recognition with Transformers

We present a new table structure recognition (TSR) approach, called TSRF...

TabLeX: A Benchmark Dataset for Structure and Content Information Extraction from Scientific Tables

Information Extraction (IE) from the tables present in scientific articl...

GFTE: Graph-based Financial Table Extraction

Tabular data is a crucial form of information expression, which can orga...

Table-to-Text Natural Language Generation with Unseen Schemas

Traditional table-to-text natural language generation (NLG) tasks focus ...