LayoutLM: Pre-training of Text and Layout for Document Image Understanding

12/31/2019
by   Yiheng Xu, et al.
0

Pre-training techniques have been verified successfully in a variety of NLP tasks in recent years. Despite the wide spread of pre-training models for NLP applications, they almost focused on text-level manipulation, while neglecting the layout and style information that is vital for document image understanding. In this paper, we propose LayoutLM to jointly model the interaction between text and layout information across scanned document images, which is beneficial for a great number of real-world document image understanding tasks such as information extraction from scanned documents. We also leverage the image features to incorporate the style information of words in LayoutLM. To the best of our knowledge, this is the first time that text and layout are jointly learned in a single framework for document-level pre-training, leading to significant performance improvement in downstream tasks for document image understanding.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/24/2021

StructuralLM: Structural Pre-training for Form Understanding

Large pre-trained language models achieve state-of-the-art results when ...
research
10/12/2022

ERNIE-Layout: Layout Knowledge Enhanced Pre-training for Visually-rich Document Understanding

Recent years have witnessed the rise and success of pre-training techniq...
research
05/16/2023

Sequence-to-Sequence Pre-training with Unified Modality Masking for Visual Document Understanding

This paper presents GenDoc, a general sequence-to-sequence document unde...
research
07/16/2021

The Law of Large Documents: Understanding the Structure of Legal Contracts Using Visual Cues

Large, pre-trained transformer models like BERT have achieved state-of-t...
research
06/15/2023

Document Entity Retrieval with Massive and Noisy Pre-training

Visually-Rich Document Entity Retrieval (VDER) is a type of machine lear...
research
06/14/2022

RDU: A Region-based Approach to Form-style Document Understanding

Key Information Extraction (KIE) is aimed at extracting structured infor...
research
09/02/2021

Skim-Attention: Learning to Focus via Document Layout

Transformer-based pre-training techniques of text and layout have proven...

Please sign up or login with your details

Forgot password? Click here to reset