MarkupLM: Pre-training of Text and Markup Language for Visually-rich Document Understanding

10/16/2021
by   Junlong Li, et al.
0

Multimodal pre-training with text, layout, and image has made significant progress for Visually-rich Document Understanding (VrDU), especially the fixed-layout documents such as scanned document images. While, there are still a large number of digital documents where the layout information is not fixed and needs to be interactively and dynamically rendered for visualization, making existing layout-based pre-training approaches not easy to apply. In this paper, we propose MarkupLM for document understanding tasks with markup languages as the backbone such as HTML/XML-based documents, where text and markup information is jointly pre-trained. Experiment results show that the pre-trained MarkupLM significantly outperforms the existing strong baseline models on several document understanding tasks. The pre-trained model and code will be publicly available at https://aka.ms/markuplm.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/18/2021

LayoutXLM: Multimodal Pre-training for Multilingual Visually-rich Document Understanding

Multimodal pre-training with text, layout, and image has achieved SOTA p...
research
02/28/2022

LiLT: A Simple yet Effective Language-Independent Layout Transformer for Structured Document Understanding

Structured document understanding has attracted considerable attention a...
research
09/02/2021

Skim-Attention: Learning to Focus via Document Layout

Transformer-based pre-training techniques of text and layout have proven...
research
08/26/2021

LayoutReader: Pre-training of Text and Layout for Reading Order Detection

Reading order detection is the cornerstone to understanding visually-ric...
research
09/01/2021

Position Masking for Improved Layout-Aware Document Understanding

Natural language processing for document scans and PDFs has the potentia...
research
03/14/2022

XYLayoutLM: Towards Layout-Aware Multimodal Networks For Visually-Rich Document Understanding

Recently, various multimodal networks for Visually-Rich Document Underst...
research
05/22/2020

Robust Layout-aware IE for Visually Rich Documents with Pre-trained Language Models

Many business documents processed in modern NLP and IR pipelines are vis...

Please sign up or login with your details

Forgot password? Click here to reset