Going Full-TILT Boogie on Document Understanding with Text-Image-Layout Transformer

02/18/2021
by   Rafał Powalski, et al.
0

We address the challenging problem of Natural Language Comprehension beyond plain-text documents by introducing the TILT neural network architecture which simultaneously learns layout information, visual features, and textual semantics. Contrary to previous approaches, we rely on a decoder capable of unifying a variety of problems involving natural language. The layout is represented as an attention bias and complemented with contextualized visual information, while the core of our model is a pretrained encoder-decoder Transformer. Our novel approach achieves state-of-the-art results in extracting information from documents and answering questions which demand layout understanding (DocVQA, CORD, WikiOps, SROIE). At the same time, we simplify the process by employing an end-to-end model.

READ FULL TEXT
POST COMMENT

Comments

There are no comments yet.

Authors

page 5

page 12

02/19/2020

LAMBERT: Layout-Aware language Modeling using BERT for information extraction

In this paper we introduce a novel approach to the problem of understand...
05/27/2020

TRIE: End-to-End Text Reading and Information Extraction for Document Understanding

Since real-world ubiquitous documents (e.g., invoices, tickets, resumes ...
06/22/2021

DocFormer: End-to-End Transformer for Document Understanding

We present DocFormer – a multi-modal transformer based architecture for ...
12/23/2021

LaTr: Layout-Aware Transformer for Scene-Text VQA

We propose a novel multimodal architecture for Scene Text Visual Questio...
02/01/2022

WebFormer: The Web-page Transformer for Structure Information Extraction

Structure information extraction refers to the task of extracting struct...
04/26/2021

InfographicVQA

Infographics are documents designed to effectively communicate informati...
04/12/2022

Neural Graph Matching for Modification Similarity Applied to Electronic Document Comparison

In this paper, we present a novel neural graph matching approach applied...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.