WeLayout: WeChat Layout Analysis System for the ICDAR 2023 Competition on Robust Layout Segmentation in Corporate Documents

05/11/2023
by   Mingliang Zhang, et al.
0

In this paper, we introduce WeLayout, a novel system for segmenting the layout of corporate documents, which stands for WeChat Layout Analysis System. Our approach utilizes a sophisticated ensemble of DINO and YOLO models, specifically developed for the ICDAR 2023 Competition on Robust Layout Segmentation. Our method significantly surpasses the baseline, securing a top position on the leaderboard with a mAP of 70.0. To achieve this performance, we concentrated on enhancing various aspects of the task, such as dataset augmentation, model architecture, bounding box refinement, and model ensemble techniques. Additionally, we trained the data separately for each document category to ensure a higher mean submission score. We also developed an algorithm for cell matching to further improve our performance. To identify the optimal weights and IoU thresholds for our model ensemble, we employed a Bayesian optimization algorithm called the Tree-Structured Parzen Estimator. Our approach effectively demonstrates the benefits of combining query-based and anchor-free models for achieving robust layout segmentation in corporate documents.

READ FULL TEXT
research
08/28/2023

Ensemble of Anchor-Free Models for Robust Bangla Document Layout Segmentation

In this research paper, we introduce a novel approach designed for the p...
research
05/24/2023

ICDAR 2023 Competition on Robust Layout Segmentation in Corporate Documents

Transforming documents into machine-processable representations is a cha...
research
02/08/2020

OoDAnalyzer: Interactive Analysis of Out-of-Distribution Samples

One major cause of performance degradation in predictive models is that ...
research
10/22/2018

Baseline Detection in Historical Documents using Convolutional U-Nets

Baseline detection is still a challenging task for heterogeneous collect...
research
08/21/2023

Performance Enhancement Leveraging Mask-RCNN on Bengali Document Layout Analysis

Understanding digital documents is like solving a puzzle, especially his...

Please sign up or login with your details

Forgot password? Click here to reset