DEXTER: An end-to-end system to extract table contents from electronic medical health documents

07/14/2022
by   Nandhinee PR, et al.
14

In this paper, we propose DEXTER, an end to end system to extract information from tables present in medical health documents, such as electronic health records (EHR) and explanation of benefits (EOB). DEXTER consists of four sub-system stages: i) table detection ii) table type classification iii) cell detection; and iv) cell content extraction. We propose a two-stage transfer learning-based approach using CDeC-Net architecture along with Non-Maximal suppression for table detection. We design a conventional computer vision-based approach for table type classification and cell detection using parameterized kernels based on image size for detecting rows and columns. Finally, we extract the text from the detected cells using pre-existing OCR engine Tessaract. To evaluate our system, we manually annotated a sample of the real-world medical dataset (referred to as Meddata) consisting of wide variations of documents (in terms of appearance) covering different table structures, such as bordered, partially bordered, borderless, or coloured tables. We experimentally show that DEXTER outperforms the commercially available Amazon Textract and Microsoft Azure Form Recognizer systems on the annotated real-world medical dataset

READ FULL TEXT

page 7

page 9

research
01/06/2020

TableNet: Deep Learning model for end-to-end Table detection and Tabular data extraction from Scanned Document Images

With the widespread use of mobile phones and scanners to photograph and ...
research
04/27/2020

CascadeTabNet: An approach for end to end table detection and structure recognition from image-based documents

An automatic table recognition method for interpretation of tabular data...
research
06/25/2021

TableSense: Spreadsheet Table Detection with Convolutional Neural Networks

Spreadsheet table detection is the task of detecting all tables on a giv...
research
03/27/2023

A large-scale dataset for end-to-end table recognition in the wild

Table recognition (TR) is one of the research hotspots in pattern recogn...
research
04/03/2019

Extracting Tables from Documents using Conditional Generative Adversarial Networks and Genetic Algorithms

Extracting information from tables in documents presents a significant c...
research
10/16/2020

A Conglomerate of Multiple OCR Table Detection and Extraction

Information representation as tables are compact and concise method that...
research
08/11/2022

Handling big tabular data of ICT supply chains: a multi-task, machine-interpretable approach

Due to the characteristics of Information and Communications Technology ...

Please sign up or login with your details

Forgot password? Click here to reset