DPTNet: A Dual-Path Transformer Architecture for Scene Text Detection

08/21/2022
by   Jingyu Lin, et al.
7

The prosperity of deep learning contributes to the rapid progress in scene text detection. Among all the methods with convolutional networks, segmentation-based ones have drawn extensive attention due to their superiority in detecting text instances of arbitrary shapes and extreme aspect ratios. However, the bottom-up methods are limited to the performance of their segmentation models. In this paper, we propose DPTNet (Dual-Path Transformer Network), a simple yet effective architecture to model the global and local information for the scene text detection task. We further propose a parallel design that integrates the convolutional network with a powerful self-attention mechanism to provide complementary clues between the attention path and convolutional path. Moreover, a bi-directional interaction module across the two paths is developed to provide complementary clues in the channel and spatial dimensions. We also upgrade the concentration operation by adding an extra multi-head attention layer to it. Our DPTNet achieves state-of-the-art results on the MSRA-TD500 dataset, and provides competitive results on other standard benchmarks in terms of both detection accuracy and speed.

READ FULL TEXT

page 1

page 3

page 5

page 7

research
02/21/2022

Real-Time Scene Text Detection with Differentiable Binarization and Adaptive Scale Fusion

Recently, segmentation-based scene text detection methods have drawn ext...
research
07/27/2023

Adaptive Segmentation Network for Scene Text Detection

Inspired by deep convolution segmentation algorithms, scene text detecto...
research
03/29/2022

Few Could Be Better Than All: Feature Sampling and Grouping for Scene Text Detection

Recently, transformer-based methods have achieved promising progresses i...
research
11/04/2020

Covariance Self-Attention Dual Path UNet for Rectal Tumor Segmentation

Deep learning algorithms are preferable for rectal tumor segmentation. H...
research
11/29/2021

Attention-based Feature Decomposition-Reconstruction Network for Scene Text Detection

Recently, scene text detection has been a challenging task. Texts with a...
research
04/13/2017

A Neural Model for User Geolocation and Lexical Dialectology

We propose a simple yet effective text- based user geolocation model bas...
research
10/31/2019

Attention Is All You Need for Chinese Word Segmentation

This paper presents a fast and accurate Chinese word segmentation (CWS) ...

Please sign up or login with your details

Forgot password? Click here to reset