CTM: Collaborative Temporal Modeling for Action Recognition

02/08/2020
by   Qian Liu, et al.
1

With the rapid development of digital multimedia, video understanding has become an important field. For action recognition, temporal dimension plays an important role, and this is quite different from image recognition. In order to learn powerful feature of videos, we propose a Collaborative Temporal Modeling (CTM) block (Figure 1) to learn temporal information for action recognition. Besides a parameter-free identity shortcut, as a separate temporal modeling block, CTM includes two collaborative paths: a spatial-aware temporal modeling path, which we propose the Temporal-Channel Convolution Module (TCCM) with unshared parameters for each spatial position (H*W) to build, and a spatial-unaware temporal modeling path. CTM blocks can seamlessly be inserted into many popular networks to generate CTM Networks and bring the capability of learning temporal information to 2D CNN backbone networks, which only capture spatial information. Experiments on several popular action recognition datasets demonstrate that CTM blocks bring the performance improvements on 2D CNN baselines, and our method achieves the competitive results against the state-of-the-art methods. Code will be made publicly available.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/03/2020

Late Temporal Modeling in 3D CNN Architectures with BERT for Action Recognition

In this work, we combine 3D convolution with late temporal modeling for ...
research
08/14/2023

On the Importance of Spatial Relations for Few-shot Action Recognition

Deep learning has achieved great success in video recognition, yet still...
research
08/05/2022

Blockwise Temporal-Spatial Pathway Network

Algorithms for video action recognition should consider not only spatial...
research
09/15/2020

Collaborative Distillation in the Parameter and Spectrum Domains for Video Action Recognition

Recent years have witnessed the significant progress of action recogniti...
research
01/19/2020

MixTConv: Mixed Temporal Convolutional Kernels for Efficient Action Recogntion

To efficiently extract spatiotemporal features of video for action recog...
research
12/10/2018

SlowFast Networks for Video Recognition

We present SlowFast networks for video recognition. Our model involves (...
research
04/25/2022

Temporal Relevance Analysis for Video Action Models

In this paper, we provide a deep analysis of temporal modeling for actio...

Please sign up or login with your details

Forgot password? Click here to reset