DeepAI AI Chat
Log In Sign Up

Synthetic-to-Real Unsupervised Domain Adaptation for Scene Text Detection in the Wild

by   Weijia Wu, et al.

Deep learning-based scene text detection can achieve preferable performance, powered with sufficient labeled training data. However, manual labeling is time consuming and laborious. At the extreme, the corresponding annotated data are unavailable. Exploiting synthetic data is a very promising solution except for domain distribution mismatches between synthetic datasets and real datasets. To address the severe domain distribution mismatch, we propose a synthetic-to-real domain adaptation method for scene text detection, which transfers knowledge from synthetic data (source domain) to real data (target domain). In this paper, a text self-training (TST) method and adversarial text instance alignment (ATA) for domain adaptive scene text detection are introduced. ATA helps the network learn domain-invariant features by training a domain classifier in an adversarial manner. TST diminishes the adverse effects of false positives (FPs) and false negatives (FNs) from inaccurate pseudo-labels. Two components have positive effects on improving the performance of scene text detectors when adapting from synthetic-to-real scenes. We evaluate the proposed method by transferring from SynthText, VISD to ICDAR2015, ICDAR2013. The results demonstrate the effectiveness of the proposed method with up to 10 improvement, which has important exploration significance for domain adaptive scene text detection. Code is available at


page 2

page 8

page 12


Self-Training for Domain Adaptive Scene Text Detection

Though deep learning based scene text detection has achieved great progr...

Domain Adaptive Scene Text Detection via Subcategorization

Most existing scene text detectors require large-scale training data whi...

SynthText3D: Synthesizing Scene Text Images from 3D Virtual Worlds

With the development of deep neural networks, the demand for a significa...

What If We Only Use Real Datasets for Scene Text Recognition? Toward Scene Text Recognition With Fewer Labels

Scene text recognition (STR) task has a common practice: All state-of-th...

Neural-Sim: Learning to Generate Training Data with NeRF

Training computer vision models usually requires collecting and labeling...

Can Shadows Reveal Biometric Information?

We study the problem of extracting biometric information of individuals ...