XYLayoutLM: Towards Layout-Aware Multimodal Networks For Visually-Rich Document Understanding

03/14/2022
by   Zhangxuan Gu, et al.
0

Recently, various multimodal networks for Visually-Rich Document Understanding(VRDU) have been proposed, showing the promotion of transformers by integrating visual and layout information with the text embeddings. However, most existing approaches utilize the position embeddings to incorporate the sequence information, neglecting the noisy improper reading order obtained by OCR tools. In this paper, we propose a robust layout-aware multimodal network named XYLayoutLM to capture and leverage rich layout information from proper reading orders produced by our Augmented XY Cut. Moreover, a Dilated Conditional Position Encoding module is proposed to deal with the input sequence of variable lengths, and it additionally extracts local layout information from both textual and visual modalities while generating position embeddings. Experiment results show that our XYLayoutLM achieves competitive results on document understanding tasks.

READ FULL TEXT
research
10/16/2021

MarkupLM: Pre-training of Text and Markup Language for Visually-rich Document Understanding

Multimodal pre-training with text, layout, and image has made significan...
research
04/16/2021

LAMPRET: Layout-Aware Multimodal PreTraining for Document Understanding

Document layout comprises both structural and visual (eg. font-sizes) in...
research
11/11/2022

Unimodal and Multimodal Representation Training for Relation Extraction

Multimodal integration of text, layout and visual information has achiev...
research
08/15/2023

Enhancing Visually-Rich Document Understanding via Layout Structure Modeling

In recent years, the use of multi-modal pre-trained Transformers has led...
research
08/26/2021

LayoutReader: Pre-training of Text and Layout for Reading Order Detection

Reading order detection is the cornerstone to understanding visually-ric...
research
03/28/2023

PosterLayout: A New Benchmark and Approach for Content-aware Visual-Textual Presentation Layout

Content-aware visual-textual presentation layout aims at arranging spati...
research
04/06/2022

Aesthetic Text Logo Synthesis via Content-aware Layout Inferring

Text logo design heavily relies on the creativity and expertise of profe...

Please sign up or login with your details

Forgot password? Click here to reset