Representing Videos as Discriminative Sub-graphs for Action Recognition

01/11/2022
by   Dong Li, et al.
7

Human actions are typically of combinatorial structures or patterns, i.e., subjects, objects, plus spatio-temporal interactions in between. Discovering such structures is therefore a rewarding way to reason about the dynamics of interactions and recognize the actions. In this paper, we introduce a new design of sub-graphs to represent and encode the discriminative patterns of each action in the videos. Specifically, we present MUlti-scale Sub-graph LEarning (MUSLE) framework that novelly builds space-time graphs and clusters the graphs into compact sub-graphs on each scale with respect to the number of nodes. Technically, MUSLE produces 3D bounding boxes, i.e., tubelets, in each video clip, as graph nodes and takes dense connectivity as graph edges between tubelets. For each action category, we execute online clustering to decompose the graph into sub-graphs on each scale through learning Gaussian Mixture Layer and select the discriminative sub-graphs as action prototypes for recognition. Extensive experiments are conducted on both Something-Something V1 V2 and Kinetics-400 datasets, and superior results are reported when comparing to state-of-the-art methods. More remarkably, our MUSLE achieves to-date the best reported accuracy of 65.0

READ FULL TEXT

page 1

page 3

page 8

research
12/15/2019

Action Genome: Actions as Composition of Spatio-temporal Scene Graphs

Action recognition has typically treated actions and activities as monol...
research
08/19/2022

Hierarchical Compositional Representations for Few-shot Action Recognition

Recently action recognition has received more and more attention for its...
research
11/08/2020

Right on Time: Multi-Temporal Convolutions for Human Action Recognition in Videos

The variations in the temporal performance of human actions observed in ...
research
11/26/2019

G-TAD: Sub-Graph Localization for Temporal Action Detection

Temporal action detection is a fundamental yet challenging task in video...
research
01/19/2021

Human Action Recognition Based on Multi-scale Feature Maps from Depth Video Sequences

Human action recognition is an active research area in computer vision. ...
research
09/09/2019

Gaussian Temporal Awareness Networks for Action Localization

Temporally localizing actions in a video is a fundamental challenge in v...
research
11/17/2022

Sub-Graph Learning for Spatiotemporal Forecasting via Knowledge Distillation

One of the challenges in studying the interactions in large graphs is to...

Please sign up or login with your details

Forgot password? Click here to reset