BasicTAD: an Astounding RGB-Only Baseline for Temporal Action Detection

05/05/2022
by   Min Yang, et al.
0

Temporal action detection (TAD) is extensively studied in the video understanding community by following the object detection pipelines in images. However, complex designs are not uncommon in TAD, such as two-stream feature extraction, multi-stage training, complex temporal modeling, and global context fusion. In this paper, we do not aim to introduce any novel technique for TAD. Instead, we study a simple, straightforward, yet must-known baseline given the current status of complex design and low efficiency in TAD. In our simple baseline (BasicTAD), we decompose the TAD pipeline into several essential components: data sampling, backbone design, neck construction, and detection head. We empirically investigate the existing techniques in each component for this baseline and, more importantly, perform end-to-end training over the entire pipeline thanks to the simplicity in design. Our BasicTAD yields an astounding RGB-Only baseline very close to the state-of-the-art methods with two-stream inputs. In addition, we further improve the BasicTAD by preserving more temporal and spatial information in network representation (termed as BasicTAD Plus). Empirical results demonstrate that our BasicTAD Plus is very efficient and significantly outperforms the previous methods on the datasets of THUMOS14 and FineAction. Our approach can serve as a strong baseline for TAD. The code will be released at https://github.com/MCG-NJU/BasicTAD.

READ FULL TEXT
research
07/09/2021

RGB Stream Is Enough for Temporal Action Detection

State-of-the-art temporal action detectors to date are based on two-stre...
research
04/06/2022

An Empirical Study of End-to-End Temporal Action Detection

Temporal action detection (TAD) is an important yet challenging task in ...
research
06/18/2021

End-to-end Temporal Action Detection with Transformer

Temporal action detection (TAD) aims to determine the semantic label and...
research
07/20/2020

Context-Aware RCNN: A Baseline for Action Detection in Videos

Video action detection approaches usually conduct actor-centric action r...
research
07/04/2022

TANet: Transformer-based Asymmetric Network for RGB-D Salient Object Detection

Existing RGB-D SOD methods mainly rely on a symmetric two-stream CNN-bas...
research
04/16/2020

Asynchronous Interaction Aggregation for Action Detection

Understanding interaction is an essential part of video action detection...
research
03/28/2023

STMixer: A One-Stage Sparse Action Detector

Traditional video action detectors typically adopt the two-stage pipelin...

Please sign up or login with your details

Forgot password? Click here to reset