TadML: A fast temporal action detection with Mechanics-MLP

06/07/2022
by   Bowen Deng, et al.
0

Temporal Action Detection(TAD) is a crucial but challenging task in video understanding.It is aimed at detecting both the type and start-end frame for each action instance in a long, untrimmed video.Most current models adopt both RGB and Optical-Flow streams for the TAD task. Thus, original RGB frames must be converted manually into Optical-Flow frames with additional computation and time cost, which is an obstacle to achieve real-time processing. At present, many models adopt two-stage strategies, which would slow the inference speed down and complicatedly tuning on proposals generating.By comparison, we propose a one-stage anchor-free temporal localization method with RGB stream only, in which a novel Newtonian Mechanics-MLP architecture is established. It has comparable accuracy with all existing state-of-the-art models, while surpasses the inference speed of these methods by a large margin. The typical inference speed in this paper is astounding 4.44 video per second on THUMOS14. In applications, because there is no need to convert optical flow, the inference speed will be faster.It also proves that MLP has great potential in downstream tasks such as TAD. The source code is available at <https://github.com/BonedDeng/TadML>

READ FULL TEXT
research
07/09/2021

RGB Stream Is Enough for Temporal Action Detection

State-of-the-art temporal action detectors to date are based on two-stre...
research
12/10/2019

Flow-Distilled IP Two-Stream Networks for Compressed Video ActionRecognition

Two-stream networks have achieved great success in video recognition. A ...
research
05/16/2018

Fast Retinomorphic Event Stream for Video Recognition and ReinforcementLearning

Good temporal representations are crucial for video understanding, and t...
research
05/16/2018

Fast Retinomorphic Event Stream for Video Recognition and Reinforcement Learning

Good temporal representations are crucial for video understanding, and t...
research
03/24/2021

Learning Salient Boundary Feature for Anchor-free Temporal Action Localization

Temporal action localization is an important yet challenging task in vid...
research
11/05/2021

KORSAL: Key-point Detection based Online Real-Time Spatio-Temporal Action Localization

Real-time and online action localization in a video is a critical yet hi...
research
11/08/2022

Two-stream Multi-dimensional Convolutional Network for Real-time Violence Detection

The increasing number of surveillance cameras and security concerns have...

Please sign up or login with your details

Forgot password? Click here to reset