DocSegTr: An Instance-Level End-to-End Document Image Segmentation Transformer

01/27/2022
by   Sanket Biswas, et al.
0

Understanding documents with rich layouts is an essential step towards information extraction. Business intelligence processes often require the extraction of useful semantic content from documents at a large scale for subsequent decision-making tasks. In this context, instance-level segmentation of different document objects(title, sections, figures, tables and so on) has emerged as an interesting problem for the document layout analysis community. To advance the research in this direction, we present a transformer-based model for end-to-end segmentation of complex layouts in document images. To our knowledge, this is the first work on transformer-based document segmentation. Extensive experimentation on the PubLayNet dataset shows that our model achieved comparable or better segmentation performance than the existing state-of-the-art approaches. We hope our simple and flexible framework could serve as a promising baseline for instance-level recognition tasks in document images.

READ FULL TEXT
research
05/08/2023

SwinDocSegmenter: An End-to-End Unified Domain Adaptive Transformer for Document Instance Segmentation

Instance-level segmentation of documents consists in assigning a class-a...
research
05/02/2023

An experimental framework for designing document structure for users' decision making – An empirical study of recipes

Textual documents need to be of good quality to ensure effective asynchr...
research
04/24/2023

PARAGRAPH2GRAPH: A GNN-based framework for layout paragraph analysis

Document layout analysis has a wide range of requirements across various...
research
11/30/2021

Donut: Document Understanding Transformer without OCR

Understanding document images (e.g., invoices) has been an important res...
research
02/03/2022

DocBed: A Multi-Stage OCR Solution for Documents with Complex Layouts

Digitization of newspapers is of interest for many reasons including pre...
research
02/01/2021

RectiNet-v2: A stacked network architecture for document image dewarping

With the advent of mobile and hand-held cameras, document images have fo...
research
08/31/2023

Document Layout Analysis on BaDLAD Dataset: A Comprehensive MViTv2 Based Approach

In the rapidly evolving digital era, the analysis of document layouts pl...

Please sign up or login with your details

Forgot password? Click here to reset