Performance Enhancement Leveraging Mask-RCNN on Bengali Document Layout Analysis

08/21/2023
by   Shrestha Datta, et al.
0

Understanding digital documents is like solving a puzzle, especially historical ones. Document Layout Analysis (DLA) helps with this puzzle by dividing documents into sections like paragraphs, images, and tables. This is crucial for machines to read and understand these documents. In the DL Sprint 2.0 competition, we worked on understanding Bangla documents. We used a dataset called BaDLAD with lots of examples. We trained a special model called Mask R-CNN to help with this understanding. We made this model better by step-by-step hyperparameter tuning, and we achieved a good dice score of 0.889. However, not everything went perfectly. We tried using a model trained for English documents, but it didn't fit well with Bangla. This showed us that each language has its own challenges. Our solution for the DL Sprint 2.0 is publicly available at https://www.kaggle.com/competitions/dlsprint2/discussion/432201 along with notebooks, weights, and inference notebook.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/26/2023

Bengali Document Layout Analysis with Detectron2

Document digitization is vital for preserving historical records, effici...
research
01/25/2023

Generalizability in Document Layout Analysis for Scientific Article Figure Caption Extraction

The lack of generalizability – in which a model trained on one dataset c...
research
08/31/2023

Document Layout Analysis on BaDLAD Dataset: A Comprehensive MViTv2 Based Approach

In the rapidly evolving digital era, the analysis of document layouts pl...
research
08/09/2021

Identifying Wetland Areas in Historical Maps using Deep Convolutional Neural Networks

1) The local environment and land usages have changed a lot during the p...
research
02/28/2022

LiLT: A Simple yet Effective Language-Independent Layout Transformer for Structured Document Understanding

Structured document understanding has attracted considerable attention a...
research
05/11/2023

WeLayout: WeChat Layout Analysis System for the ICDAR 2023 Competition on Robust Layout Segmentation in Corporate Documents

In this paper, we introduce WeLayout, a novel system for segmenting the ...

Please sign up or login with your details

Forgot password? Click here to reset