Improving Document Binarization via Adversarial Noise-Texture Augmentation

10/25/2018
by   Ankan Kumar Bhunia, et al.
7

Binarization of degraded document images is an elementary step in most of the problems in document image analysis domain. The paper re-visits the binarization problem by introducing an adversarial learning approach. We construct a Texture Augmentation Network that transfers the texture element of a degraded reference document image to a clean binary image. In this way, the network creates multiple versions of the same textual content with various noisy textures, thus enlarging the available document binarization datasets. At last, the newly generated images are passed through a Binarization network to get back the clean version. By jointly training the two networks we can increase the adversarial robustness of our system. Also, it is noteworthy that our model can learn from unpaired data. Experimental results suggest that the proposed method achieves superior performance over widely used DIBCO datasets.

READ FULL TEXT
research
07/14/2020

UDBNET: Unsupervised Document Binarization Network via Adversarial Game

Degraded document image binarization is one of the most challenging task...
research
08/04/2023

CTP-Net: Character Texture Perception Network for Document Image Forgery Localization

Due to the progression of information technology in recent years, docume...
research
06/09/2023

DocAligner: Annotating Real-world Photographic Document Images by Simply Taking Pictures

Recently, there has been a growing interest in research concerning docum...
research
08/30/2022

Augraphy: A Data Augmentation Library for Document Images

This paper introduces Augraphy, a Python package geared toward realistic...
research
01/28/2019

Learning to Clean: A GAN Perspective

In the big data era, the impetus to digitize the vast reservoirs of data...
research
06/01/2023

Improving the Robustness of Summarization Systems with Dual Augmentation

A robust summarization system should be able to capture the gist of the ...
research
03/16/2023

ShabbyPages: A Reproducible Document Denoising and Binarization Dataset

Document denoising and binarization are fundamental problems in the docu...

Please sign up or login with your details

Forgot password? Click here to reset