Tables to LaTeX: structure and content extraction from scientific tables

10/31/2022
by   Pratik Kayal, et al.
0

Scientific documents contain tables that list important information in a concise fashion. Structure and content extraction from tables embedded within PDF research documents is a very challenging task due to the existence of visual features like spanning cells and content features like mathematical symbols and equations. Most existing table structure identification methods tend to ignore these academic writing features. In this paper, we adapt the transformer-based language modeling paradigm for scientific table structure and content extraction. Specifically, the proposed model converts a tabular image to its corresponding LaTeX source code. Overall, we outperform the current state-of-the-art baselines and achieve an exact match accuracy of 70.35 and 49.69 analysis demonstrates that the proposed models efficiently identify the number of rows and columns, the alphanumeric characters, the LaTeX tokens, and symbols.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/30/2021

ICDAR 2021 Competition on Scientific Table Image Recognition to LaTeX

Tables present important information concisely in many scientific docume...
research
05/12/2021

TabLeX: A Benchmark Dataset for Structure and Content Information Extraction from Scientific Tables

Information Extraction (IE) from the tables present in scientific articl...
research
09/09/2021

MATE: Multi-view Attention for Table Transformer Efficiency

This work presents a sparse-attention Transformer architecture for model...
research
04/03/2019

Extracting Tables from Documents using Conditional Generative Adversarial Networks and Genetic Algorithms

Extracting information from tables in documents presents a significant c...
research
06/08/2021

ICDAR 2021 Competition on Scientific Literature Parsing

Scientific literature contain important information related to cutting-e...
research
06/08/2022

STable: Table Generation Framework for Encoder-Decoder Models

The output structure of database-like tables, consisting of values struc...
research
05/27/2019

Transcribing Content from Structural Images with Spotlight Mechanism

Transcribing content from structural images, e.g., writing notes from mu...

Please sign up or login with your details

Forgot password? Click here to reset