TSI: Temporal Saliency Integration for Video Action Recognition

06/02/2021
by   Haisheng Su, et al.
0

Efficient spatiotemporal modeling is an important yet challenging problem for video action recognition. Existing state-of-the-art methods exploit motion clues to assist in short-term temporal modeling through temporal difference over consecutive frames. However, background noises will be inevitably introduced due to the camera movement. Besides, movements of different actions can vary greatly. In this paper, we propose a Temporal Saliency Integration (TSI) block, which mainly contains a Salient Motion Excitation (SME) module and a Cross-scale Temporal Integration (CTI) module. Specifically, SME aims to highlight the motion-sensitive area through local-global motion modeling, where the background suppression and pyramidal feature difference are conducted successively between neighboring frames to capture motion dynamics with less background noises. CTI is designed to perform multi-scale temporal modeling through a group of separate 1D convolutions respectively. Meanwhile, temporal interactions across different scales are integrated with attention mechanism. Through these two modules, long short-term temporal relationships can be encoded efficiently by introducing limited additional parameters. Extensive experiments are conducted on several popular benchmarks (i.e., Something-Something v1 v2, Kinetics-400, UCF-101, and HMDB-51), which demonstrate the effectiveness and superiority of our proposed method.

READ FULL TEXT

page 1

page 3

page 4

research
06/30/2021

Long-Short Temporal Modeling for Efficient Action Recognition

Efficient long-short temporal modeling is key for enhancing the performa...
research
12/31/2022

An end-to-end multi-scale network for action prediction in videos

In this paper, we develop an efficient multi-scale network to predict ac...
research
12/18/2020

TDN: Temporal Difference Networks for Efficient Action Recognition

Temporal modeling still remains challenging for action recognition in vi...
research
11/21/2019

TEINet: Towards an Efficient Architecture for Video Recognition

Efficiency is an important issue in designing video architectures for ac...
research
08/18/2022

Spatial Temporal Graph Attention Network for Skeleton-Based Action Recognition

It's common for current methods in skeleton-based action recognition to ...
research
05/14/2020

TAM: Temporal Adaptive Module for Video Recognition

Temporal modeling is crucial for capturing spatiotemporal structure in v...
research
11/18/2021

M2A: Motion Aware Attention for Accurate Video Action Recognition

Advancements in attention mechanisms have led to significant performance...

Please sign up or login with your details

Forgot password? Click here to reset