Efficient Document Image Classification Using Region-Based Graph Neural Network

06/25/2021
by   Jaya Krishna Mandivarapu, et al.
0

Document image classification remains a popular research area because it can be commercialized in many enterprise applications across different industries. Recent advancements in large pre-trained computer vision and language models and graph neural networks has lent document image classification many tools. However using large pre-trained models usually requires substantial computing resources which could defeat the cost-saving advantages of automatic document image classification. In the paper we propose an efficient document image classification framework that uses graph convolution neural networks and incorporates textual, visual and layout information of the document. We have rigorously benchmarked our proposed algorithm against several state-of-art vision and language models on both publicly available dataset and a real-life insurance document classification dataset. Empirical results on both publicly available and real-world data show that our methods achieve near SOTA performance yet require much less computing resources and time for model training and inference. This results in solutions than offer better cost advantages, especially in scalable deployment for enterprise applications. The results showed that our algorithm can achieve classification performance quite close to SOTA. We also provide comprehensive comparisons of computing resources, model sizes, train and inference time between our proposed methods and baselines. In addition we delineate the cost per image using our method and other baselines.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/04/2022

DiT: Self-supervised Pre-training for Document Image Transformer

Image Transformer has recently achieved significant progress for natural...
research
08/16/2019

PubLayNet: largest dataset ever for document layout analysis

Recognizing the layout of unstructured digital documents is an important...
research
05/24/2021

StructuralLM: Structural Pre-training for Form Understanding

Large pre-trained language models achieve state-of-the-art results when ...
research
08/29/2023

Vision Grid Transformer for Document Layout Analysis

Document pre-trained models and grid-based models have proven to be very...
research
10/02/2018

Ancient Coin Classification Using Graph Transduction Games

Recognizing the type of an ancient coin requires theoretical expertise a...
research
07/27/2023

Text-guided Foundation Model Adaptation for Pathological Image Classification

The recent surge of foundation models in computer vision and natural lan...
research
05/24/2022

An interpretation of the final fully connected layer

In recent years neural networks have achieved state-of-the-art accuracy ...

Please sign up or login with your details

Forgot password? Click here to reset