Video Is Graph: Structured Graph Module for Video Action Recognition

10/12/2021
by   Rong-Chang Li, et al.
0

In the field of action recognition, video clips are always treated as ordered frames for subsequent processing. To achieve spatio-temporal perception, existing approaches propose to embed adjacent temporal interaction in the convolutional layer. The global semantic information can therefore be obtained by stacking multiple local layers hierarchically. However, such global temporal accumulation can only reflect the high-level semantics in deep layers, neglecting the potential low-level holistic clues in shallow layers. In this paper, we first propose to transform a video sequence into a graph to obtain direct long-term dependencies among temporal frames. To preserve sequential information during transformation, we devise a structured graph module (SGM), achieving fine-grained temporal interactions throughout the entire network. In particular, SGM divides the neighbors of each node into several temporal regions so as to extract global structural information with diverse sequential flows. Extensive experiments are performed on standard benchmark datasets, i.e., Something-Something V1 V2, Diving48, Kinetics-400, UCF101, and HMDB51. The reported performance and analysis demonstrate that SGM can achieve outstanding precision with less computational complexity.

READ FULL TEXT

page 1

page 3

page 6

page 7

research
01/19/2023

Revisiting the Spatial and Temporal Modeling for Few-shot Action Recognition

Spatial and temporal modeling is one of the most core aspects of few-sho...
research
02/24/2022

Slow-Fast Visual Tempo Learning for Video-based Action Recognition

Action visual tempo characterizes the dynamics and the temporal scale of...
research
03/17/2020

Feedback Graph Convolutional Network for Skeleton-based Action Recognition

Skeleton-based action recognition has attracted considerable attention i...
research
04/02/2019

Semantics-Guided Neural Networks for Efficient Skeleton-Based Human Action Recognition

Skeleton-based human action recognition has attracted a lot of interests...
research
01/20/2019

Visualizing Semantic Structures of Sequential Data by Learning Temporal Dependencies

While conventional methods for sequential learning focus on interaction ...
research
01/15/2023

Learning Sparse Temporal Video Mapping for Action Quality Assessment in Floor Gymnastics

Athlete performance measurement in sports videos requires modeling long ...
research
03/30/2022

Controllable Augmentations for Video Representation Learning

This paper focuses on self-supervised video representation learning. Mos...

Please sign up or login with your details

Forgot password? Click here to reset