Self-Supervised Text Erasing with Controllable Image Synthesis

by   Gangwei Jiang, et al.

Recent efforts on scene text erasing have shown promising results. However, existing methods require rich yet costly label annotations to obtain robust models, which limits the use for practical applications. To this end, we study an unsupervised scenario by proposing a novel Self-supervised Text Erasing (STE) framework that jointly learns to synthesize training images with erasure ground-truth and accurately erase texts in the real world. We first design a style-aware image synthesis function to generate synthetic images with diverse styled texts based on two synthetic mechanisms. To bridge the text style gap between the synthetic and real-world data, a policy network is constructed to control the synthetic mechanisms by picking style parameters with the guidance of two specifically designed rewards. The synthetic training images with erasure ground-truth are then fed to train a coarse-to-fine erasing network. To produce better erasing outputs, a triplet erasure loss is designed to enforce the refinement stage to recover background textures. Moreover, we provide a new dataset (called PosterErase), which contains 60K high-resolution posters with texts and is more challenging for the text erasing task. The proposed method has been extensively evaluated with both PosterErase and the widely-used SCUT-Enstext dataset. Notably, on PosterErase, our unsupervised method achieves 5.07 in terms of FID, with a relative performance of 20.9 supervised baselines.


page 1

page 3

page 6

page 7


RewriteNet: Realistic Scene Text Image Generation via Editing Text in Real-world Image

Scene text editing (STE), which converts a text in a scene image into th...

SIR: Self-supervised Image Rectification via Seeing the Same Scene from Multiple Different Lenses

Deep learning has demonstrated its power in image rectification by lever...

Style Generation: Image Synthesis based on Coarsely Matched Texts

Previous text-to-image synthesis algorithms typically use explicit textu...

CUT: Controllable Unsupervised Text Simplification

In this paper, we focus on the challenge of learning controllable text s...

Progressive Scene Text Erasing with Self-Supervision

Scene text erasing seeks to erase text contents from scene images and cu...

A Scene-Text Synthesis Engine Achieved Through Learning from Decomposed Real-World Data

Scene-text image synthesis techniques aimed at naturally composing text ...

DCL: Differential Contrastive Learning for Geometry-Aware Depth Synthesis

We describe a method for realistic depth synthesis that learns diverse v...

Please sign up or login with your details

Forgot password? Click here to reset