Temporal-Spatial Mapping for Action Recognition

09/11/2018
by   Xiaolin Song, et al.
0

Deep learning models have enjoyed great success for image related computer vision tasks like image classification and object detection. For video related tasks like human action recognition, however, the advancements are not as significant yet. The main challenge is the lack of effective and efficient models in modeling the rich temporal spatial information in a video. We introduce a simple yet effective operation, termed Temporal-Spatial Mapping (TSM), for capturing the temporal evolution of the frames by jointly analyzing all the frames of a video. We propose a video level 2D feature representation by transforming the convolutional features of all frames to a 2D feature map, referred to as VideoMap. With each row being the vectorized feature representation of a frame, the temporal-spatial features are compactly represented, while the temporal dynamic evolution is also well embedded. Based on the VideoMap representation, we further propose a temporal attention model within a shallow convolutional neural network to efficiently exploit the temporal-spatial dynamics. The experiment results show that the proposed scheme achieves the state-of-the-art performance, with 4.2 Temporal Segment Network (TSN), a competing baseline method, on the challenging human action benchmark dataset HMDB51.

READ FULL TEXT
research
07/13/2017

Leveraging the Path Signature for Skeleton-based Human Action Recognition

Human action recognition in videos is one of the most challenging tasks ...
research
02/01/2015

Dynamic texture and scene classification by transferring deep image features

Dynamic texture and scene classification are two fundamental problems in...
research
08/12/2021

TF-Blender: Temporal Feature Blender for Video Object Detection

Video objection detection is a challenging task because isolated video f...
research
04/20/2023

Search-Map-Search: A Frame Selection Paradigm for Action Recognition

Despite the success of deep learning in video understanding tasks, proce...
research
03/02/2022

Colar: Effective and Efficient Online Action Detection by Consulting Exemplars

Online action detection has attracted increasing research interests in r...
research
12/14/2020

TDAF: Top-Down Attention Framework for Vision Tasks

Human attention mechanisms often work in a top-down manner, yet it is no...
research
01/09/2020

An Emerging Coding Paradigm VCM: A Scalable Coding Approach Beyond Feature and Signal

In this paper, we study a new problem arising from the emerging MPEG sta...

Please sign up or login with your details

Forgot password? Click here to reset