Mask TextSpotter v3: Segmentation Proposal Network for Robust Scene Text Spotting

by   Minghui Liao, et al.

Recent end-to-end trainable methods for scene text spotting, integrating detection and recognition, showed much progress. However, most of the current arbitrary-shape scene text spotters use region proposal networks (RPN) to produce proposals. RPN relies heavily on manually designed anchors and its proposals are represented with axis-aligned rectangles. The former presents difficulties in handling text instances of extreme aspect ratios or irregular shapes, and the latter often includes multiple neighboring instances into a single proposal, in cases of densely oriented text. To tackle these problems, we propose Mask TextSpotter v3, an end-to-end trainable scene text spotter that adopts a Segmentation Proposal Network (SPN) instead of an RPN. Our SPN is anchor-free and gives accurate representations of arbitrary-shape proposals. It is therefore superior to RPN in detecting text instances of extreme aspect ratios or irregular shapes. Furthermore, the accurate proposals produced by SPN allow masked RoI features to be used for decoupling neighboring text instances. As a result, our Mask TextSpotter v3 can handle text instances of extreme aspect ratios or irregular shapes, and its recognition accuracy won't be affected by nearby text or background noise. Specifically, we outperform state-of-the-art methods by 21.9 percent on the Rotated ICDAR 2013 dataset (rotation robustness), 5.9 percent on the Total-Text dataset (shape robustness), and achieve state-of-the-art performance on the MSRA-TD500 dataset (aspect ratio robustness). Code is available at:


page 2

page 6

page 9

page 10

page 19


CentripetalText: An Efficient Text Instance Representation for Scene Text Detection

Scene text detection remains a grand challenge due to the variation in t...

Mask TextSpotter: An End-to-End Trainable Neural Network for Spotting Text with Arbitrary Shapes

Recently, models based on deep neural networks have dominated the fields...

Mask is All You Need: Rethinking Mask R-CNN for Dense and Arbitrary-Shaped Scene Text Detection

Due to the large success in object detection and instance segmentation, ...

Kernel Proposal Network for Arbitrary Shape Text Detection

Segmentation-based methods have achieved great success for arbitrary sha...

Detecting Multi-Oriented Text with Corner-based Region Proposals

Previous approaches for scene text detection usually rely on manually de...

Character Proposal Network for Robust Text Extraction

Maximally stable extremal regions (MSER), which is a popular method to g...

Detecting Curve Text in the Wild: New Dataset and New Solution

Scene text detection has been made great progress in recent years. The d...

Please sign up or login with your details

Forgot password? Click here to reset