TableNet: Deep Learning model for end-to-end Table detection and Tabular data extraction from Scanned Document Images

01/06/2020
by   Shubham Paliwal, et al.
34

With the widespread use of mobile phones and scanners to photograph and upload documents, the need for extracting the information trapped in unstructured document images such as retail receipts, insurance claim forms and financial invoices is becoming more acute. A major hurdle to this objective is that these images often contain information in the form of tables and extracting data from tabular sub-images presents a unique set of challenges. This includes accurate detection of the tabular region within an image, and subsequently detecting and extracting information from the rows and columns of the detected table. While some progress has been made in table detection, extracting the table contents is still a challenge since this involves more fine grained table structure(rows columns) recognition. Prior approaches have attempted to solve the table detection and structure recognition problems independently using two separate models. In this paper, we propose TableNet: a novel end-to-end deep learning model for both table detection and structure recognition. The model exploits the interdependence between the twin tasks of table detection and table structure recognition to segment out the table and column regions. This is followed by semantic rule-based row extraction from the identified tabular sub-regions. The proposed model and extraction approach was evaluated on the publicly available ICDAR 2013 and Marmot Table datasets obtaining state of the art results. Additionally, we demonstrate that feeding additional semantic features further improves model performance and that the model exhibits transfer learning across datasets. Another contribution of this paper is to provide additional table structure annotations for the Marmot data, which currently only has annotations for table detection.

READ FULL TEXT

page 1

page 4

research
04/27/2020

CascadeTabNet: An approach for end to end table detection and structure recognition from image-based documents

An automatic table recognition method for interpretation of tabular data...
research
02/20/2021

Deep Structured Feature Networks for Table Detection and Tabular Data Extraction from Scanned Financial Document Images

Automatic table detection in PDF documents has achieved a great success ...
research
04/21/2021

Guided Table Structure Recognition through Anchor Optimization

This paper presents the novel approach towards table structure recogniti...
research
10/06/2021

On Cropped versus Uncropped Training Sets in Tabular Structure Detection

Automated document processing for tabular information extraction is high...
research
05/25/2021

Tab.IAIS: Flexible Table Recognition and Semantic Interpretation System

Table extraction is an important but still unsolved problem. In this pap...
research
04/29/2021

TabAug: Data Driven Augmentation for Enhanced Table Structure Recognition

Table Structure Recognition is an essential part of end-to-end tabular d...
research
07/14/2022

DEXTER: An end-to-end system to extract table contents from electronic medical health documents

In this paper, we propose DEXTER, an end to end system to extract inform...

Please sign up or login with your details

Forgot password? Click here to reset