Multi-Stream Single Shot Spatial-Temporal Action Detection

08/22/2019
by   Pengfei Zhang, et al.
0

We present a 3D Convolutional Neural Networks (CNNs) based single shot detector for spatial-temporal action detection tasks. Our model includes: (1) two short-term appearance and motion streams, with single RGB and optical flow image input separately, in order to capture the spatial and temporal information for the current frame; (2) two long-term 3D ConvNet based stream, working on sequences of continuous RGB and optical flow images to capture the context from past frames. Our model achieves strong performance for action detection in video and can be easily integrated into any current two-stream action detection methods. We report a frame-mAP of 71.30 UCF101-24 actions dataset, achieving the state-of-the-art result of the one-stage methods. To the best of our knowledge, our work is the first system that combined 3D CNN and SSD in action detection tasks.

READ FULL TEXT
research
12/19/2018

D3D: Distilled 3D Networks for Video Action Recognition

State-of-the-art methods for video action recognition commonly use an en...
research
11/21/2018

Locally-Consistent Deformable Convolution Networks for Fine-Grained Action Detection

Fine-grained action detection is an important task with numerous applica...
research
11/09/2017

Two-stream Collaborative Learning with Spatial-Temporal Attention for Video Classification

Video classification is highly important with wide applications, such as...
research
12/03/2018

Spatial-temporal Fusion Convolutional Neural Network for Simulated Driving Behavior Recognition

Abnormal driving behaviour is one of the leading cause of terrible traff...
research
11/08/2022

Two-stream Multi-dimensional Convolutional Network for Real-time Violence Detection

The increasing number of surveillance cameras and security concerns have...
research
08/30/2016

Motion Representation with Acceleration Images

Information of time differentiation is extremely important cue for a mot...
research
12/14/2018

AU R-CNN: Encoding Expert Prior Knowledge into R-CNN for Action Unit Detection

Modeling action units (AUs) on human faces is challenging because variou...

Please sign up or login with your details

Forgot password? Click here to reset