StructuralLM: Structural Pre-training for Form Understanding

05/24/2021
by   Chenliang Li, et al.
0

Large pre-trained language models achieve state-of-the-art results when fine-tuned on downstream NLP tasks. However, they almost exclusively focus on text-only representation, while neglecting cell-level layout information that is important for form image understanding. In this paper, we propose a new pre-training approach, StructuralLM, to jointly leverage cell and layout information from scanned documents. Specifically, we pre-train StructuralLM with two new designs to make the most of the interactions of cell and layout information: 1) each cell as a semantic unit; 2) classification of cell positions. The pre-trained StructuralLM achieves new state-of-the-art results in different types of downstream tasks, including form understanding (from 78.95 to 85.14), document visual question answering (from 72.59 to 83.94) and document image classification (from 94.43 to 96.08).

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/31/2019

LayoutLM: Pre-training of Text and Layout for Document Image Understanding

Pre-training techniques have been verified successfully in a variety of ...
research
07/16/2021

The Law of Large Documents: Understanding the Structure of Legal Contracts Using Visual Cues

Large, pre-trained transformer models like BERT have achieved state-of-t...
research
05/30/2023

LayoutMask: Enhance Text-Layout Interaction in Multi-modal Pre-training for Document Understanding

Visually-rich Document Understanding (VrDU) has attracted much research ...
research
06/25/2021

Efficient Document Image Classification Using Region-Based Graph Neural Network

Document image classification remains a popular research area because it...
research
08/10/2021

BROS: A Layout-Aware Pre-trained Language Model for Understanding Documents

Understanding documents from their visual snapshots is an emerging probl...
research
05/16/2023

Sequence-to-Sequence Pre-training with Unified Modality Masking for Visual Document Understanding

This paper presents GenDoc, a general sequence-to-sequence document unde...
research
05/28/2023

Plug-and-Play Document Modules for Pre-trained Models

Large-scale pre-trained models (PTMs) have been widely used in document-...

Please sign up or login with your details

Forgot password? Click here to reset