Towards a Multi-modal, Multi-task Learning based Pre-training Framework for Document Representation Learning

09/30/2020
by   Subhojeet Pramanik, et al.
9

In this paper, we propose a multi-task learning-based framework that utilizes a combination of self-supervised and supervised pre-training tasks to learn a generic document representation. We design the network architecture and the pre-training tasks to incorporate the multi-modal document information across text, layout, and image dimensions and allow the network to work with multi-page documents. We showcase the applicability of our pre-training framework on a variety of different real-world document tasks such as document classification, document information extraction, and document retrieval. We conduct exhaustive experiments to compare performance against different ablations of our framework and state-of-the-art baselines. We discuss the current limitations and next steps for our work.

READ FULL TEXT
research
12/29/2020

LayoutLMv2: Multi-modal Pre-training for Visually-Rich Document Understanding

Pre-training of text and layout has proved effective in a variety of vis...
research
03/31/2023

Towards Flexible Multi-modal Document Models

Creative workflows for generating graphical documents involve complex in...
research
06/01/2022

HYCEDIS: HYbrid Confidence Engine for Deep Document Intelligence System

Measuring the confidence of AI models is critical for safely deploying A...
research
11/27/2022

MGDoc: Pre-training with Multi-granular Hierarchy for Document Image Understanding

Document images are a ubiquitous source of data where the text is organi...
research
09/11/2023

TransferDoc: A Self-Supervised Transferable Document Representation Learning Model Unifying Vision and Language

The field of visual document understanding has witnessed a rapid growth ...
research
04/27/2022

Human-Centered Prior-Guided and Task-Dependent Multi-Task Representation Learning for Action Recognition Pre-Training

Recently, much progress has been made for self-supervised action recogni...
research
01/27/2021

Multi-Modal Aesthetic Assessment for MObile Gaming Image

With the proliferation of various gaming technology, services, game styl...

Please sign up or login with your details

Forgot password? Click here to reset