BROS: A Layout-Aware Pre-trained Language Model for Understanding Documents

08/10/2021
by   Teakgyu Hong, et al.
0

Understanding documents from their visual snapshots is an emerging problem that requires both advanced computer vision and NLP methods. The recent advance in OCR enables the accurate recognition of text blocks, yet it is still challenging to extract key information from documents due to the diversity of their layouts. Although recent studies on pre-trained language models show the importance of incorporating layout information on this task, the conjugation of texts and their layouts still follows the style of BERT optimized for understanding the 1D text. This implies there is room for further improvement considering the 2D nature of text layouts. This paper introduces a pre-trained language model, BERT Relying On Spatiality (BROS), which effectively utilizes the information included in individual text blocks and their layouts. Specifically, BROS encodes spatial information by utilizing relative positions and learns spatial dependencies between OCR blocks with a novel area-masking strategy. These two novel approaches lead to an efficient encoding of spatial layout information highlighted by the robust performance of BROS under low-resource environments. We also introduce a general-purpose parser that can be combined with BROS to extract key information even when there is no order information between text blocks. BROS shows its superiority on four public benchmarks – FUNSD, SROIE*, CORD, and SciTSR – and its robustness in practical cases where order information of text blocks is not available. Further experiments with a varying number of training examples demonstrate the high training efficiency of our approach. Our code will be open to the public.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/18/2021

RoBERTuito: a pre-trained language model for social media text in Spanish

Since BERT appeared, Transformer language models and transfer learning h...
research
07/28/2022

Knowing Where and What: Unified Word Block Pretraining for Document Understanding

Due to the complex layouts of documents, it is challenging to extract in...
research
05/24/2021

StructuralLM: Structural Pre-training for Form Understanding

Large pre-trained language models achieve state-of-the-art results when ...
research
05/22/2020

Robust Layout-aware IE for Visually Rich Documents with Pre-trained Language Models

Many business documents processed in modern NLP and IR pipelines are vis...
research
08/12/2021

VTLayout: Fusion of Visual and Text Features for Document Layout Analysis

Documents often contain complex physical structures, which make the Docu...
research
09/19/2021

Wav-BERT: Cooperative Acoustic and Linguistic Representation Learning for Low-Resource Speech Recognition

Unifying acoustic and linguistic representation learning has become incr...
research
06/01/2021

Incorporating Visual Layout Structures for Scientific Text Classification

Classifying the core textual components of a scientific paper-title, aut...

Please sign up or login with your details

Forgot password? Click here to reset