Semantic Image Networks for Human Action Recognition

01/21/2019
by   Sunder Ali Khowaja, et al.
0

In this paper, we propose the use of a semantic image, an improved representation for video analysis, principally in combination with Inception networks. The semantic image is obtained by applying localized sparse segmentation using global clustering (LSSGC) prior to the approximate rank pooling which summarizes the motion characteristics in single or multiple images. It incorporates the background information by overlaying a static background from the window onto the subsequent segmented frames. The idea is to improve the action-motion dynamics by focusing on the region which is important for action recognition and encoding the temporal variances using the frame ranking method. We also propose the sequential combination of Inception-ResNetv2 and long-short-term memory network (LSTM) to leverage the temporal variances for improved recognition performance. Extensive analysis has been carried out on UCF101 and HMDB51 datasets which are widely used in action recognition studies. We show that (i) the semantic image generates better activations and converges faster than its original variant, (ii) using segmentation prior to approximate rank pooling yields better recognition performance, (iii) The use of LSTM leverages the temporal variance information from approximate rank pooling to model the action behavior better than the base network, (iv) the proposed representations can be adaptive as they can be used with existing methods such as temporal segment networks to improve the recognition performance, and (v) our proposed four-stream network architecture comprising of semantic images and semantic optical flows achieves state-of-the-art performance, 95.9 and HMDB51, respectively.

READ FULL TEXT
research
12/02/2016

Action Recognition with Dynamic Image Networks

We introduce the concept of "dynamic image", a novel compact representat...
research
02/26/2019

IF-TTN: Information Fused Temporal Transformation Network for Video Action Recognition

Effective spatiotemporal feature representation is crucial to the video-...
research
07/18/2017

Skeleton Based Human Action Recognition with Global Context-Aware Attention LSTM Networks

Human action recognition in 3D skeleton sequences has attracted a lot of...
research
05/06/2020

Hybrid and hierarchical fusion networks: a deep cross-modal learning architecture for action recognition

Two-stream networks have provided an alternate way of exploiting the spa...
research
05/24/2017

Sequence Summarization Using Order-constrained Kernelized Feature Subspaces

Representations that can compactly and effectively capture temporal evol...
research
11/03/2022

Quantifying and Learning Static vs. Dynamic Information in Deep Spatiotemporal Networks

There is limited understanding of the information captured by deep spati...
research
09/06/2016

Making a Case for Learning Motion Representations with Phase

This work advocates Eulerian motion representation learning over the cur...

Please sign up or login with your details

Forgot password? Click here to reset