DeepAI AI Chat
Log In Sign Up

OCR Graph Features for Manipulation Detection in Documents

09/10/2020
by   Hailey James, et al.
0

Detecting manipulations in digital documents is becoming increasingly important for information verification purposes. Due to the proliferation of image editing software, altering key information in documents has become widely accessible. Nearly all approaches in this domain rely on a procedural approach, using carefully generated features and a hand-tuned scoring system, rather than a data-driven and generalizable approach. We frame this issue as a graph comparison problem using the character bounding boxes, and propose a model that leverages graph features using OCR (Optical Character Recognition). Our model relies on a data-driven approach to detect alterations by training a random forest classifier on the graph-based OCR features. We evaluate our algorithm's forgery detection performance on dataset constructed from real business documents with slight forgery imperfections. Our proposed model dramatically outperforms the most closely-related document manipulation detection model on this task.

READ FULL TEXT
07/04/2022

BusiNet – a Light and Fast Text Detection Network for Business Documents

For digitizing or indexing physical documents, Optical Character Recogni...
01/28/2022

Self-paced learning to improve text row detection in historical documents with missing labels

An important preliminary step of optical character recognition systems i...
08/12/2019

Self-supervised Data Bootstrapping for Deep Optical Character Recognition of Identity Documents

The essential task of verifying person identities at airports and nation...
05/07/2020

A Gaussian Process Upsampling Model for Improvements in Optical Character Recognition

Optical Character Recognition and extraction is a key tool in the automa...
05/17/2022

Detection Masking for Improved OCR on Noisy Documents

Optical Character Recognition (OCR), the task of extracting textual info...
11/28/2021

CHARTER: heatmap-based multi-type chart data extraction

The digital conversion of information stored in documents is a great sou...
06/22/2020

A Step Towards Interpretable Authorship Verification

A central problem that has been researched for many years in the field o...