Skim-Attention: Learning to Focus via Document Layout

09/02/2021
by   Laura Nguyen, et al.
0

Transformer-based pre-training techniques of text and layout have proven effective in a number of document understanding tasks. Despite this success, multimodal pre-training models suffer from very high computational and memory costs. Motivated by human reading strategies, this paper presents Skim-Attention, a new attention mechanism that takes advantage of the structure of the document and its layout. Skim-Attention only attends to the 2-dimensional position of the words in a document. Our experiments show that Skim-Attention obtains a lower perplexity than prior works, while being more computationally efficient. Skim-Attention can be further combined with long-range Transformers to efficiently process long documents. We also show how Skim-Attention can be used off-the-shelf as a mask for any Pre-trained Language Model, allowing to improve their performance while restricting attention. Finally, we show the emergence of a document structure representation in Skim-Attention.

READ FULL TEXT

page 9

page 15

research
10/16/2021

MarkupLM: Pre-training of Text and Markup Language for Visually-rich Document Understanding

Multimodal pre-training with text, layout, and image has made significan...
research
03/25/2022

Multimodal Pre-training Based on Graph Attention Network for Document Understanding

Document intelligence as a relatively new research topic supports many b...
research
12/31/2019

LayoutLM: Pre-training of Text and Layout for Document Image Understanding

Pre-training techniques have been verified successfully in a variety of ...
research
12/15/2021

LongT5: Efficient Text-To-Text Transformer for Long Sequences

Recent work has shown that either (1) increasing the input length or (2)...
research
10/12/2022

ERNIE-Layout: Layout Knowledge Enhanced Pre-training for Visually-rich Document Understanding

Recent years have witnessed the rise and success of pre-training techniq...
research
02/22/2022

Socialformer: Social Network Inspired Long Document Modeling for Document Ranking

Utilizing pre-trained language models has achieved great success for neu...
research
08/31/2023

Document Layout Analysis on BaDLAD Dataset: A Comprehensive MViTv2 Based Approach

In the rapidly evolving digital era, the analysis of document layouts pl...

Please sign up or login with your details

Forgot password? Click here to reset