TAM: Temporal Adaptive Module for Video Recognition

05/14/2020
by   Zhaoyang Liu, et al.
0

Temporal modeling is crucial for capturing spatiotemporal structure in videos for action recognition. Video data is with extremely complex dynamics along temporal dimension due to various factors such as camera motion, speed variation, and different activities. To effectively capture this diverse motion pattern, this paper presents a new temporal adaptive module (TAM) to generate video-specific kernels based on its own feature maps. TAM proposes a unique two-level adaptive modeling scheme by decoupling dynamic kernels into a location insensitive importance map and a location invariant aggregation weight. The importance map is learned in a local temporal window to capture short term information, while the aggregation weight is generated from a global view with a focus on long-term structure. TAM is a principled module and could be integrated into 2D CNNs to yield a powerful video architecture (TANet) with a very small extra computational cost. The extensive experiments on Kinetics-400 demonstrate that TAM outperforms other temporal modeling methods consistently owing to its adaptive modeling strategy. On Something-Something datasets, TANet achieves superior performance compared with previous state-of-the-art methods. The code will be made available soon at https://github.com/liu-zhy/TANet.

READ FULL TEXT
research
12/18/2020

TDN: Temporal Difference Networks for Efficient Action Recognition

Temporal modeling still remains challenging for action recognition in vi...
research
07/22/2021

EAN: Event Adaptive Network for Enhanced Action Recognition

Efficiently modeling spatial-temporal information in videos is crucial f...
research
08/18/2021

Target Adaptive Context Aggregation for Video Scene Graph Generation

This paper deals with a challenging task of video scene graph generation...
research
11/21/2019

TEINet: Towards an Efficient Architecture for Video Recognition

Efficiency is an important issue in designing video architectures for ac...
research
06/02/2021

TSI: Temporal Saliency Integration for Video Action Recognition

Efficient spatiotemporal modeling is an important yet challenging proble...
research
10/12/2021

TAda! Temporally-Adaptive Convolutions for Video Understanding

Spatial convolutions are widely used in numerous deep video models. It f...
research
01/25/2022

Capturing Temporal Information in a Single Frame: Channel Sampling Strategies for Action Recognition

We address the problem of capturing temporal information for video class...

Please sign up or login with your details

Forgot password? Click here to reset