End-to-End Video Text Spotting with Transformer

03/20/2022
by   Weijia Wu, et al.
0

Recent video text spotting methods usually require the three-staged pipeline, i.e., detecting text in individual images, recognizing localized text, tracking text streams with post-processing to generate final results. These methods typically follow the tracking-by-match paradigm and develop sophisticated pipelines. In this paper, rooted in Transformer sequence modeling, we propose a simple, but effective end-to-end video text DEtection, Tracking, and Recognition framework (TransDETR). TransDETR mainly includes two advantages: 1) Different from the explicit match paradigm in the adjacent frame, TransDETR tracks and recognizes each text implicitly by the different query termed text query over long-range temporal sequence (more than 7 frames). 2) TransDETR is the first end-to-end trainable video text spotting framework, which simultaneously addresses the three sub-tasks (e.g., text detection, tracking, recognition). Extensive experiments in four video text datasets (i.e.,ICDAR2013 Video, ICDAR2015 Video, Minetto, and YouTube Video Text) are conducted to demonstrate that TransDETR achieves state-of-the-art performance with up to around 8.0 can be found at https://github.com/weijiawu/TransDETR.

READ FULL TEXT

page 4

page 9

research
11/29/2021

End-to-End Referring Video Object Segmentation with Multimodal Transformers

The referring video object segmentation task (RVOS) involves segmentatio...
research
03/31/2023

Video text tracking for dense and small text based on pp-yoloe-r and sort algorithm

Although end-to-end video text spotting methods based on Transformer can...
research
03/08/2019

Efficient Video Scene Text Spotting: Unifying Detection, Tracking, and Recognition

This paper proposes an unified framework for efficiently spotting scene ...
research
05/16/2022

CONSENT: Context Sensitive Transformer for Bold Words Classification

We present CONSENT, a simple yet effective CONtext SENsitive Transformer...
research
08/20/2023

ESTextSpotter: Towards Better Scene Text Spotting with Explicit Synergy in Transformer

In recent years, end-to-end scene text spotting approaches are evolving ...
research
08/20/2019

An End-to-end Video Text Detector with Online Tracking

Video text detection is considered as one of the most difficult tasks in...
research
11/09/2021

Video Text Tracking With a Spatio-Temporal Complementary Model

Text tracking is to track multiple texts in a video,and construct a traj...

Please sign up or login with your details

Forgot password? Click here to reset