Rethinking Image-based Table Recognition Using Weakly Supervised Methods

03/14/2023
by   Nam Tuan Ly, et al.
0

Most of the previous methods for table recognition rely on training datasets containing many richly annotated table images. Detailed table image annotation, e.g., cell or text bounding box annotation, however, is costly and often subjective. In this paper, we propose a weakly supervised model named WSTabNet for table recognition that relies only on HTML (or LaTeX) code-level annotations of table images. The proposed model consists of three main parts: an encoder for feature extraction, a structure decoder for generating table structure, and a cell decoder for predicting the content of each cell in the table. Our system is trained end-to-end by stochastic gradient descent algorithms, requiring only table images and their ground-truth HTML (or LaTeX) representations. To facilitate table recognition with deep learning, we create and release WikiTableSet, the largest publicly available image-based table recognition dataset built from Wikipedia. WikiTableSet contains nearly 4 million English table images, 590K Japanese table images, and 640k French table images with corresponding HTML representation and cell bounding boxes. The extensive experiments on WikiTableSet and two large-scale datasets: FinTabNet and PubTabNet demonstrate that the proposed weakly supervised model achieves better, or similar accuracies compared to the state-of-the-art models on all benchmark datasets.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/15/2023

An End-to-End Multi-Task Learning Model for Image-based Table Recognition

Image-based table recognition is a challenging task due to the diversity...
research
03/13/2023

Improving Table Structure Recognition with Visual-Alignment Sequential Coordinate Modeling

Table structure recognition aims to extract the logical and physical str...
research
03/27/2023

A large-scale dataset for end-to-end table recognition in the wild

Table recognition (TR) is one of the research hotspots in pattern recogn...
research
03/01/2023

Aligning benchmark datasets for table structure recognition

Benchmark datasets for table structure recognition (TSR) must be careful...
research
07/06/2020

Text Recognition – Real World Data and Where to Find Them

We present a method for exploiting weakly annotated images to improve te...
research
03/23/2022

GriTS: Grid table similarity metric for table structure recognition

In this paper, we propose a new class of evaluation metric for table str...
research
07/01/2022

End-to-end cell recognition by point annotation

Reliable quantitative analysis of immunohistochemical staining images re...

Please sign up or login with your details

Forgot password? Click here to reset