BiNet: Degraded-Manuscript Binarization in Diverse Document Textures and Layouts using Deep Encoder-Decoder Networks

11/13/2019
by   Maruf A. Dhali, et al.
17

Handwritten document-image binarization is a semantic segmentation process to differentiate ink pixels from background pixels. It is one of the essential steps towards character recognition, writer identification, and script-style evolution analysis. The binarization task itself is challenging due to the vast diversity of writing styles, inks, and paper materials. It is even more difficult for historical manuscripts due to the aging and degradation of the documents over time. One of such manuscripts is the Dead Sea Scrolls (DSS) image collection, which poses extreme challenges for the existing binarization techniques. This article proposes a new binarization technique for the DSS images using the deep encoder-decoder networks. Although the artificial neural network proposed here is primarily designed to binarize the DSS images, it can be trained on different manuscript collections as well. Additionally, the use of transfer learning makes the network already utilizable for a wide range of handwritten documents, making it a unique multi-purpose tool for binarization. Qualitative results and several quantitative comparisons using both historical manuscripts and datasets from handwritten document image binarization competition (H-DIBCO and DIBCO) exhibit the robustness and the effectiveness of the system. The best performing network architecture proposed here is a variant of the U-Net encoder-decoders.

READ FULL TEXT

page 2

page 8

page 9

page 10

page 12

page 14

page 15

page 23

research
10/02/2022

DARE: A large-scale handwritten date recognition system

Handwritten text recognition for historical documents is an important ta...
research
12/08/2019

ICDAR 2019 Competition on Image Retrieval for Historical Handwritten Documents

This competition investigates the performance of large-scale retrieval o...
research
06/03/2023

TransDocAnalyser: A Framework for Offline Semi-structured Handwritten Document Analysis in the Legal Domain

State-of-the-art offline Optical Character Recognition (OCR) frameworks ...
research
05/19/2021

Light-weight Document Image Cleanup using Perceptual Loss

Smartphones have enabled effortless capturing and sharing of documents i...
research
09/06/2017

Automatic Document Image Binarization using Bayesian Optimization

Document image binarization is often a challenging task due to various f...
research
08/17/2022

Boosting Modern and Historical Handwritten Text Recognition with Deformable Convolutions

Handwritten Text Recognition (HTR) in free-layout pages is a challenging...
research
01/23/2020

Text Extraction and Restoration of Old Handwritten Documents

Image restoration is very crucial computer vision task. This paper descr...

Please sign up or login with your details

Forgot password? Click here to reset