Deep Visual Template-Free Form Parsing

09/05/2019
by   Brian Davis, et al.
12

Automatic, template-free extraction of information from form images is challenging due to the variety of form layouts. This is even more challenging for historical forms due to noise and degradation. A crucial part of the extraction process is associating input text with pre-printed labels. We present a learned, template-free solution to detecting pre-printed text and input text/handwriting and predicting pair-wise relationships between them. While previous approaches to this problem have been focused on clean images and clear layouts, we show our approach is effective in the domain of noisy, degraded, and varied form images. We introduce a new dataset of historical form images (late 1800s, early 1900s) for training and validating our approach. Our method uses a convolutional network to detect pre-printed text and input text lines. We pool features from the detection network to classify possible relationships in a language-agnostic way. We show that our proposed pairing method outperforms heuristic rules and that visual features are critical to obtaining high accuracy.

READ FULL TEXT

page 1

page 3

page 6

research
03/15/2021

Generating Synthetic Handwritten Historical Documents With OCR Constrained GANs

We present a framework to generate synthetic historical documents with p...
research
03/08/2022

Language Matters: A Weakly Supervised Pre-training Approach for Scene Text Detection and Spotting

Recently, Vision-Language Pre-training (VLP) techniques have greatly ben...
research
07/21/2018

Equal But Not The Same: Understanding the Implicit Relationship Between Persuasive Images and Text

Images and text in advertisements interact in complex, non-literal ways....
research
12/15/2020

docExtractor: An off-the-shelf historical document element extraction

We present docExtractor, a generic approach for extracting visual elemen...
research
05/17/2013

Font Acknowledgment and Character Extraction of Digital and Scanned Images

The font recognition and character extraction is of immense importance a...
research
05/16/2018

Convolutional Neural Network Architecture for Recovering Watermark Synchronization

Since real-time contents can be captured and downloaded very easily, cop...

Please sign up or login with your details

Forgot password? Click here to reset