Minimum Efforts to Build an End-to-End Spatial-Temporal Action Detector

06/07/2022
by   Lin Sui, et al.
0

Spatial-temporal action detection is a vital part of video understanding. Current spatial-temporal action detection methods will first use an object detector to obtain person candidate proposals. Then, the model will classify the person candidates into different action categories. So-called two-stage methods are heavy and hard to apply in real-world applications. Some existing methods use a unified model structure, But they perform badly with the vanilla model and often need extra modules to boost the performance. In this paper, we explore the strategy to build an end-to-end spatial-temporal action detector with minimal modifications. To this end, we propose a new method named ME-STAD, which solves the spatial-temporal action detection problem in an end-to-end manner. Besides the model design, we propose a novel labeling strategy to deal with sparse annotations in spatial-temporal datasets. The proposed ME-STAD achieves better results (2.2 around 80 modifications with previous methods and does not require extra components. Our code will be made public.

READ FULL TEXT
research
03/28/2023

STMixer: A One-Stage Sparse Action Detector

Traditional video action detectors typically adopt the two-stage pipelin...
research
05/14/2022

ETAD: A Unified Framework for Efficient Temporal Action Detection

Untrimmed video understanding such as temporal action detection (TAD) of...
research
06/13/2023

E2E-LOAD: End-to-End Long-form Online Action Detection

Recently, there has been a growing trend toward feature-based approaches...
research
12/13/2022

SST: Real-time End-to-end Monocular 3D Reconstruction via Sparse Spatial-Temporal Guidance

Real-time monocular 3D reconstruction is a challenging problem that rema...
research
12/08/2018

Spatial-Temporal Person Re-identification

Most of current person re-identification (ReID) methods neglect a spatia...
research
08/17/2022

Towards an Error-free Deep Occupancy Detector for Smart Camera Parking System

Although the smart camera parking system concept has existed for decades...
research
04/26/2021

DVMark: A Deep Multiscale Framework for Video Watermarking

Video watermarking embeds a message into a cover video in an imperceptib...

Please sign up or login with your details

Forgot password? Click here to reset