MATrIX – Modality-Aware Transformer for Information eXtraction

05/17/2022
by   Thomas Delteil, et al.
0

We present MATrIX - a Modality-Aware Transformer for Information eXtraction in the Visual Document Understanding (VDU) domain. VDU covers information extraction from visually rich documents such as forms, invoices, receipts, tables, graphs, presentations, or advertisements. In these, text semantics and visual information supplement each other to provide a global understanding of the document. MATrIX is pre-trained in an unsupervised way with specifically designed tasks that require the use of multi-modal information (spatial, visual, or textual). We consider the spatial and text modalities all at once in a single token set. To make the attention more flexible, we use a learned modality-aware relative bias in the attention mechanism to modulate the attention between the tokens of different modalities. We evaluate MATrIX on 3 different datasets each with strong baselines.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/22/2021

DocFormer: End-to-End Transformer for Document Understanding

We present DocFormer – a multi-modal transformer based architecture for ...
research
06/02/2023

DocFormerv2: Local Features for Document Understanding

We propose DocFormerv2, a multi-modal transformer for Visual Document Un...
research
05/19/2023

Fast-StrucTexT: An Efficient Hourglass Transformer with Modality-guided Dynamic Token Merge for Document Understanding

Transformers achieve promising performance in document understanding bec...
research
06/26/2022

RoME: Role-aware Mixture-of-Expert Transformer for Text-to-Video Retrieval

Seas of videos are uploaded daily with the popularity of social channels...
research
06/01/2023

PV2TEA: Patching Visual Modality to Textual-Established Information Extraction

Information extraction, e.g., attribute value extraction, has been exten...
research
07/01/2021

Multimodal Graph-based Transformer Framework for Biomedical Relation Extraction

The recent advancement of pre-trained Transformer models has propelled t...
research
10/07/2020

VisualWordGrid: Information Extraction From Scanned Documents Using A Multimodal Approach

We introduce a novel approach for scanned document representation to per...

Please sign up or login with your details

Forgot password? Click here to reset