Explore Faster Localization Learning For Scene Text Detection

07/04/2022
by   Yuzhong Zhao, et al.
0

Generally pre-training and long-time training computation are necessary for obtaining a good-performance text detector based on deep networks. In this paper, we present a new scene text detection network (called FANet) with a Fast convergence speed and Accurate text localization. The proposed FANet is an end-to-end text detector based on transformer feature learning and normalized Fourier descriptor modeling, where the Fourier Descriptor Proposal Network and Iterative Text Decoding Network are designed to efficiently and accurately identify text proposals. Additionally, a Dense Matching Strategy and a well-designed loss function are also proposed for optimizing the network performance. Extensive experiments are carried out to demonstrate that the proposed FANet can achieve the SOTA performance with fewer training epochs and no pre-training. When we introduce additional data for pre-training, the proposed FANet can achieve SOTA performance on MSRATD500, CTW1500 and TotalText. The ablation experiments also verify the effectiveness of our contributions.

READ FULL TEXT

page 2

page 3

page 8

research
09/12/2022

PreSTU: Pre-Training for Scene-Text Understanding

The ability to read and reason about texts in an image is often lacking ...
research
04/29/2022

Vision-Language Pre-Training for Boosting Scene Text Detectors

Recently, vision-language joint representation learning has proven to be...
research
07/27/2023

Adaptive Segmentation Network for Scene Text Detection

Inspired by deep convolution segmentation algorithms, scene text detecto...
research
05/21/2020

Text-to-Text Pre-Training for Data-to-Text Tasks

We study the pre-train + fine-tune strategy for data-to-text tasks. Fine...
research
11/21/2016

TextBoxes: A Fast Text Detector with a Single Deep Neural Network

This paper presents an end-to-end trainable fast scene text detector, na...
research
10/12/2021

On Exploring and Improving Robustness of Scene Text Detection Models

It is crucial to understand the robustness of text detection models with...
research
08/21/2021

Grid-VLP: Revisiting Grid Features for Vision-Language Pre-training

Existing approaches to vision-language pre-training (VLP) heavily rely o...

Please sign up or login with your details

Forgot password? Click here to reset