We propose a novel end-to-end document understanding model called SeRum
...
Text recognition in the wild is a long-standing problem in computer visi...
Visual information extraction (VIE), which aims to simultaneously perfor...
Recently, Table Structure Recognition (TSR) task, aiming at identifying ...
The recent large-scale Contrastive Language-Image Pretraining (CLIP) mod...
As textual attributes like font are core design elements of document for...
Document Information Extraction (DIE) has attracted increasing attention...
Scene segmentation and classification (SSC) serve as a critical step tow...
The task of Grammatical Error Correction (GEC) has received remarkable
a...
Relational understanding is critical for a number of visually-rich docum...
The self-supervised Masked Image Modeling (MIM) schema, following
"mask-...
Recently, table structure recognition has achieved impressive progress w...
Recently, Vision Transformers (ViT), with the self-attention (SA) as the...
Recently, a series of decomposition-based scene text detection methods h...