TransferDoc: A Self-Supervised Transferable Document Representation Learning Model Unifying Vision and Language

09/11/2023
by   Souhail Bakkali, et al.
0

The field of visual document understanding has witnessed a rapid growth in emerging challenges and powerful multi-modal strategies. However, they rely on an extensive amount of document data to learn their pretext objectives in a “pre-train-then-fine-tune” paradigm and thus, suffer a significant performance drop in real-world online industrial settings. One major reason is the over-reliance on OCR engines to extract local positional information within a document page. Therefore, this hinders the model's generalizability, flexibility and robustness due to the lack of capturing global information within a document image. We introduce TransferDoc, a cross-modal transformer-based architecture pre-trained in a self-supervised fashion using three novel pretext objectives. TransferDoc learns richer semantic concepts by unifying language and visual representations, which enables the production of more transferable models. Besides, two novel downstream tasks have been introduced for a “closer-to-real” industrial evaluation scenario where TransferDoc outperforms other state-of-the-art approaches.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/04/2022

DiT: Self-supervised Pre-training for Document Image Transformer

Image Transformer has recently achieved significant progress for natural...
research
06/27/2022

Bi-VLDoc: Bidirectional Vision-Language Modeling for Visually-Rich Document Understanding

Multi-modal document pre-trained models have proven to be very effective...
research
09/30/2020

Towards a Multi-modal, Multi-task Learning based Pre-training Framework for Document Representation Learning

In this paper, we propose a multi-task learning-based framework that uti...
research
04/18/2022

LayoutLMv3: Pre-training for Document AI with Unified Text and Image Masking

Self-supervised pre-training techniques have achieved remarkable progres...
research
11/02/2022

RegCLR: A Self-Supervised Framework for Tabular Representation Learning in the Wild

Recent advances in self-supervised learning (SSL) using large models to ...
research
03/31/2021

On the Origin of Species of Self-Supervised Learning

In the quiet backwaters of cs.CV, cs.LG and stat.ML, a cornucopia of new...
research
06/09/2023

DocAligner: Annotating Real-world Photographic Document Images by Simply Taking Pictures

Recently, there has been a growing interest in research concerning docum...

Please sign up or login with your details

Forgot password? Click here to reset