Zoom Text Detector

09/07/2022
by   Chuang Yang, et al.
14

To pursue comprehensive performance, recent text detectors improve detection speed at the expense of accuracy. They adopt shrink-mask based text representation strategies, which leads to a high dependency of detection accuracy on shrink-masks. Unfortunately, three disadvantages cause unreliable shrink-masks. Specifically, these methods try to strengthen the discrimination of shrink-masks from the background by semantic information. However, the feature defocusing phenomenon that coarse layers are optimized by fine-grained objectives limits the extraction of semantic features. Meanwhile, since both shrink-masks and the margins belong to texts, the detail loss phenomenon that the margins are ignored hinders the distinguishment of shrink-masks from the margins, which causes ambiguous shrink-mask edges. Moreover, false-positive samples enjoy similar visual features with shrink-masks. They aggravate the decline of shrink-masks recognition. To avoid the above problems, we propose a Zoom Text Detector (ZTD) inspired by the zoom process of the camera. Specifically, Zoom Out Module (ZOM) is introduced to provide coarse-grained optimization objectives for coarse layers to avoid feature defocusing. Meanwhile, Zoom In Module (ZIM) is presented to enhance the margins recognition to prevent detail loss. Furthermore, Sequential-Visual Discriminator (SVD) is designed to suppress false-positive samples by sequential and visual features. Experiments verify the superior comprehensive performance of ZTD.

READ FULL TEXT

page 1

page 3

page 4

page 7

page 10

page 11

research
12/02/2021

Visual-Semantic Transformer for Scene Text Recognition

Modeling semantic information is helpful for scene text recognition. In ...
research
02/28/2023

Focus On Details: Online Multi-object Tracking with Diverse Fine-grained Representation

Discriminative representation is essential to keep a unique identifier f...
research
06/27/2022

TextDCT: Arbitrary-Shaped Text Detection via Discrete Cosine Transform Mask

Arbitrary-shaped scene text detection is a challenging task due to the v...
research
07/22/2019

Quadruplet Selection Methods for Deep Embedding Learning

Recognition of objects with subtle differences has been used in many pra...
research
07/15/2023

Theoretical Analysis of Binary Masks in Snapshot Compressive Imaging Systems

Snapshot compressive imaging (SCI) systems have gained significant atten...
research
11/09/2018

A Fully Automated System for Sizing Nasal PAP Masks Using Facial Photographs

We present a fully automated system for sizing nasal Positive Airway Pre...
research
08/04/2021

What's Wrong with the Bottom-up Methods in Arbitrary-shape Scene Text Detection

The latest trend in the bottom-up perspective for arbitrary-shape scene ...

Please sign up or login with your details

Forgot password? Click here to reset