One-Shot Doc Snippet Detection: Powering Search in Document Beyond Text

09/12/2022
by   Abhinav Java, et al.
10

Active consumption of digital documents has yielded scope for research in various applications, including search. Traditionally, searching within a document has been cast as a text matching problem ignoring the rich layout and visual cues commonly present in structured documents, forms, etc. To that end, we ask a mostly unexplored question: "Can we search for other similar snippets present in a target document page given a single query instance of a document snippet?". We propose MONOMER to solve this as a one-shot snippet detection task. MONOMER fuses context from visual, textual, and spatial modalities of snippets and documents to find query snippet in target documents. We conduct extensive ablations and experiments showing MONOMER outperforms several baselines from one-shot object detection (BHRL), template matching, and document understanding (LayoutLMv3). Due to the scarcity of relevant data for the task at hand, we train MONOMER on programmatically generated data having many visually similar query snippets and target document pairs from two datasets - Flamingo Forms and PubLayNet. We also do a human study to validate the generated data.

READ FULL TEXT

page 5

page 13

page 14

research
10/22/2019

One-Shot Template Matching for Automatic Document Data Capture

In this paper, we propose a novel one-shot template-matching algorithm t...
research
03/27/2019

Graph Convolution for Multimodal Information Extraction from Visually Rich Documents

Visually rich documents (VRDs) are ubiquitous in daily business and life...
research
10/16/2018

A Retrieval Framework and Implementation for Electronic Documents with Similar Layouts

As the number of digital documents requiring investigation increases, it...
research
03/18/2020

ScanSSD: Scanning Single Shot Detector for Mathematical Formulas in PDF Document Images

We introduce the Scanning Single Shot Detector (ScanSSD) for locating ma...
research
09/02/2019

Know2Look: Commonsense Knowledge for Visual Search

With the rise in popularity of social media, images accompanied by conte...
research
10/17/2020

Learning from similarity and information extraction from structured documents

Neural networks have successfully advanced in the task of information ex...
research
03/01/2023

Cross-Modal Entity Matching for Visually Rich Documents

Visually rich documents (VRD) are physical/digital documents that utiliz...

Please sign up or login with your details

Forgot password? Click here to reset