Unshredding of Shredded Documents: Computational Framework and Implementation

A shredded document D is a document whose pages have been cut into strips for the purpose of destroying private, confidential, or sensitive information I contained in D. Shredding has become a standard means of government organizations, businesses, and private individuals to destroy archival records that have been officially classified for disposal. It can also be used to destroy documentary evidence of wrongdoings by entities who are trying to hide I. In this paper, we present an optimal O((n× m)^2) algorithm A that reconstructs an n-page D, where each page p is shredded into m strips. We also present the efficacy of A in reconstructing three document types: hand-written, machine typed-set, and images.

READ FULL TEXT
research
04/05/2023

Context-Aware Classification of Legal Document Pages

For many business applications that require the processing, indexing, an...
research
12/07/2022

Hierarchical multimodal transformers for Multi-Page DocVQA

Document Visual Question Answering (DocVQA) refers to the task of answer...
research
12/09/2019

Modular Multimodal Architecture for Document Classification

Page classification is a crucial component to any document analysis syst...
research
10/10/2017

DocEmul: a Toolkit to Generate Structured Historical Documents

We propose a toolkit to generate structured synthetic documents emulatin...
research
11/24/2021

Handling tree-structured text: parsing directory pages

The determination of the reading sequence of text is fundamental to docu...
research
07/31/2023

Workshop on Document Intelligence Understanding

Document understanding and information extraction include different task...
research
06/14/2016

Using Fuzzy Logic to Leverage HTML Markup for Web Page Representation

The selection of a suitable document representation approach plays a cru...

Please sign up or login with your details

Forgot password? Click here to reset