Multi-Modal Association based Grouping for Form Structure Extraction

07/09/2021
by   Milan Aggarwal, et al.
5

Document structure extraction has been a widely researched area for decades. Recent work in this direction has been deep learning-based, mostly focusing on extracting structure using fully convolution NN through semantic segmentation. In this work, we present a novel multi-modal approach for form structure extraction. Given simple elements such as textruns and widgets, we extract higher-order structures such as TextBlocks, Text Fields, Choice Fields, and Choice Groups, which are essential for information collection in forms. To achieve this, we obtain a local image patch around each low-level element (reference) by identifying candidate elements closest to it. We process textual and spatial representation of candidates sequentially through a BiLSTM to obtain context-aware representations and fuse them with image patch features obtained by processing it through a CNN. Subsequently, the sequential decoder takes this fused feature vector to predict the association type between reference and candidates. These predicted associations are utilized to determine larger structures through connected components analysis. Experimental results show the effectiveness of our approach achieving a recall of 90.29 73.80 outperforming semantic segmentation baselines significantly. We show the efficacy of our method through ablations, comparing it against using individual modalities. We also introduce our new rich human-annotated Forms Dataset.

READ FULL TEXT

page 5

page 8

research
07/09/2021

Form2Seq : A Framework for Higher-Order Form Structure Extraction

Document structure extraction has been a widely researched area for deca...
research
11/27/2019

Document Structure Extraction for Forms using Very High Resolution Semantic Segmentation

In this work, we look at the problem of structure extraction from docume...
research
03/31/2023

Towards Flexible Multi-modal Document Models

Creative workflows for generating graphical documents involve complex in...
research
05/01/2023

PRSeg: A Lightweight Patch Rotate MLP Decoder for Semantic Segmentation

The lightweight MLP-based decoder has become increasingly promising for ...
research
07/05/2023

Multi-Modal Prototypes for Open-Set Semantic Segmentation

In semantic segmentation, adapting a visual system to novel object categ...
research
02/05/2021

Metaknowledge Extraction Based on Multi-Modal Documents

The triple-based knowledge in large-scale knowledge bases is most likely...

Please sign up or login with your details

Forgot password? Click here to reset