Aggregated Text Transformer for Scene Text Detection

11/25/2022
by   Zhao Zhou, et al.
0

This paper explores the multi-scale aggregation strategy for scene text detection in natural images. We present the Aggregated Text TRansformer(ATTR), which is designed to represent texts in scene images with a multi-scale self-attention mechanism. Starting from the image pyramid with multiple resolutions, the features are first extracted at different scales with shared weight and then fed into an encoder-decoder architecture of Transformer. The multi-scale image representations are robust and contain rich information on text contents of various sizes. The text Transformer aggregates these features to learn the interaction across different scales and improve text representation. The proposed method detects scene texts by representing each text instance as an individual binary mask, which is tolerant of curve texts and regions with dense instances. Extensive experiments on public scene text detection datasets demonstrate the effectiveness of the proposed framework.

READ FULL TEXT

page 1

page 3

page 4

page 7

page 8

page 9

research
12/02/2019

Multi-Scale Self-Attention for Text Classification

In this paper, we introduce the prior knowledge, multi-scale structure, ...
research
06/08/2023

InvPT++: Inverted Pyramid Multi-Task Transformer for Visual Scene Understanding

Multi-task scene understanding aims to design models that can simultaneo...
research
07/05/2018

A Single Shot Text Detector with Scale-adaptive Anchors

Currently, most top-performing text detection networks tend to employ fi...
research
11/12/2022

MSLKANet: A Multi-Scale Large Kernel Attention Network for Scene Text Removal

Scene text removal aims to remove the text and fill the regions with per...
research
07/05/2023

Multi-Scale Prototypical Transformer for Whole Slide Image Classification

Whole slide image (WSI) classification is an essential task in computati...
research
03/29/2022

Few Could Be Better Than All: Feature Sampling and Grouping for Scene Text Detection

Recently, transformer-based methods have achieved promising progresses i...
research
05/13/2020

Representing Whole Slide Cancer Image Features with Hilbert Curves

Regions of Interest (ROI) contain morphological features in pathology wh...

Please sign up or login with your details

Forgot password? Click here to reset