Robust Layout-aware IE for Visually Rich Documents with Pre-trained Language Models

05/22/2020
by   Mengxi Wei, et al.
0

Many business documents processed in modern NLP and IR pipelines are visually rich: in addition to text, their semantics can also be captured by visual traits such as layout, format, and fonts. We study the problem of information extraction from visually rich documents (VRDs) and present a model that combines the power of large pre-trained language models and graph neural networks to efficiently encode both textual and visual information in business documents. We further introduce new fine-tuning objectives to improve in-domain unsupervised fine-tuning to better utilize large amount of unlabeled in-domain data. We experiment on real world invoice and resume data sets and show that the proposed method outperforms strong text-based RoBERTa baselines by 6.3 absolute F1 on invoices and 4.7 few-shot setting, our method requires up to 30x less annotation data than the baseline to achieve the same level of performance at  90

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/16/2021

MarkupLM: Pre-training of Text and Markup Language for Visually-rich Document Understanding

Multimodal pre-training with text, layout, and image has made significan...
research
03/27/2019

Graph Convolution for Multimodal Information Extraction from Visually Rich Documents

Visually rich documents (VRDs) are ubiquitous in daily business and life...
research
05/01/2021

Capturing Logical Structure of Visually Structured Documents with Multimodal Transition Parser

While many NLP papers, tasks and pipelines assume raw, clean texts, many...
research
12/16/2021

Learning Rich Representation of Keyphrases from Text

In this work, we explore how to learn task-specific language models aime...
research
08/10/2021

BROS: A Layout-Aware Pre-trained Language Model for Understanding Documents

Understanding documents from their visual snapshots is an emerging probl...
research
12/20/2022

An Augmentation Strategy for Visually Rich Documents

Many business workflows require extracting important fields from form-li...
research
06/14/2022

FETILDA: An Effective Framework For Fin-tuned Embeddings For Long Financial Text Documents

Unstructured data, especially text, continues to grow rapidly in various...

Please sign up or login with your details

Forgot password? Click here to reset