Going Full-TILT Boogie on Document Understanding with Text-Image-Layout Transformer

02/18/2021
by   Rafał Powalski, et al.
0

We address the challenging problem of Natural Language Comprehension beyond plain-text documents by introducing the TILT neural network architecture which simultaneously learns layout information, visual features, and textual semantics. Contrary to previous approaches, we rely on a decoder capable of unifying a variety of problems involving natural language. The layout is represented as an attention bias and complemented with contextualized visual information, while the core of our model is a pretrained encoder-decoder Transformer. Our novel approach achieves state-of-the-art results in extracting information from documents and answering questions which demand layout understanding (DocVQA, CORD, WikiOps, SROIE). At the same time, we simplify the process by employing an end-to-end model.

READ FULL TEXT

page 5

page 12

research
02/19/2020

LAMBERT: Layout-Aware language Modeling using BERT for information extraction

In this paper we introduce a novel approach to the problem of understand...
research
06/02/2023

DocFormerv2: Local Features for Document Understanding

We propose DocFormerv2, a multi-modal transformer for Visual Document Un...
research
05/27/2020

TRIE: End-to-End Text Reading and Information Extraction for Document Understanding

Since real-world ubiquitous documents (e.g., invoices, tickets, resumes ...
research
06/22/2021

DocFormer: End-to-End Transformer for Document Understanding

We present DocFormer – a multi-modal transformer based architecture for ...
research
12/23/2021

LaTr: Layout-Aware Transformer for Scene-Text VQA

We propose a novel multimodal architecture for Scene Text Visual Questio...
research
04/24/2023

PARAGRAPH2GRAPH: A GNN-based framework for layout paragraph analysis

Document layout analysis has a wide range of requirements across various...
research
12/06/2022

Multimodal Tree Decoder for Table of Contents Extraction in Document Images

Table of contents (ToC) extraction aims to extract headings of different...

Please sign up or login with your details

Forgot password? Click here to reset