Slow-Fast Visual Tempo Learning for Video-based Action Recognition

02/24/2022
by   Yuanzhong Liu, et al.
0

Action visual tempo characterizes the dynamics and the temporal scale of an action, which is helpful to distinguish human actions that share high similarities in visual dynamics and appearance. Previous methods capture the visual tempo either by sampling raw videos with multiple rates, which requires a costly multi-layer network to handle each rate, or by hierarchically sampling backbone features, which relies heavily on high-level features that miss fine-grained temporal dynamics. In this work, we propose a Temporal Correlation Module (TCM), which can be easily embedded into the current action recognition backbones in a plug-in-and-play manner, to extract action visual tempo from low-level backbone features at single-layer remarkably. Specifically, our TCM contains two main components: a Multi-scale Temporal Dynamics Module (MTDM) and a Temporal Attention Module (TAM). MTDM applies a correlation operation to learn pixel-wise fine-grained temporal dynamics for both fast-tempo and slow-tempo. TAM adaptively emphasizes expressive features and suppresses inessential ones via analyzing the global information across various tempos. Extensive experiments conducted on several action recognition benchmarks, e.g. Something-Something V1 V2, Kinetics-400, UCF-101, and HMDB-51, have demonstrated that the proposed TCM is effective to promote the performance of the existing video-based action recognition models for a large margin. The source code is publicly released at https://github.com/zphyix/TCM.

READ FULL TEXT

page 1

page 11

page 13

research
04/07/2020

Temporal Pyramid Network for Action Recognition

Visual tempo characterizes the dynamics and the temporal scale of an act...
research
06/28/2021

Feature Combination Meets Attention: Baidu Soccer Embeddings and Transformer based Temporal Detection

With rapidly evolving internet technologies and emerging tools, sports r...
research
10/12/2021

Video Is Graph: Structured Graph Module for Video Action Recognition

In the field of action recognition, video clips are always treated as or...
research
09/03/2022

Dynamic Spatio-Temporal Specialization Learning for Fine-Grained Action Recognition

The goal of fine-grained action recognition is to successfully discrimin...
research
07/25/2021

Adaptive Recursive Circle Framework for Fine-grained Action Recognition

How to model fine-grained spatial-temporal dynamics in videos has been a...
research
07/25/2020

Approximated Bilinear Modules for Temporal Modeling

We consider two less-emphasized temporal properties of video: 1. Tempora...
research
08/20/2019

Action recognition with spatial-temporal discriminative filter banks

Action recognition has seen a dramatic performance improvement in the la...

Please sign up or login with your details

Forgot password? Click here to reset