Spatio-temporal Relation Modeling for Few-shot Action Recognition

12/09/2021
by   Anirudh Thatipelli, et al.
6

We propose a novel few-shot action recognition framework, STRM, which enhances class-specific feature discriminability while simultaneously learning higher-order temporal representations. The focus of our approach is a novel spatio-temporal enrichment module that aggregates spatial and temporal contexts with dedicated local patch-level and global frame-level feature enrichment sub-modules. Local patch-level enrichment captures the appearance-based characteristics of actions. On the other hand, global frame-level enrichment explicitly encodes the broad temporal context, thereby capturing the relevant object features over time. The resulting spatio-temporally enriched representations are then utilized to learn the relational matching between query and support action sub-sequences. We further introduce a query-class similarity classifier on the patch-level enriched features to enhance class-specific feature discriminability by reinforcing the feature learning at different stages in the proposed framework. Experiments are performed on four few-shot action recognition benchmarks: Kinetics, SSv2, HMDB51 and UCF101. Our extensive ablation study reveals the benefits of the proposed contributions. Furthermore, our approach sets a new state-of-the-art on all four benchmarks. On the challenging SSv2 benchmark, our approach achieves an absolute gain of 3.5 literature. Our code and models will be publicly released.

READ FULL TEXT

page 2

page 6

page 12

page 13

page 14

research
01/15/2021

Temporal-Relational CrossTransformers for Few-Shot Action Recognition

We propose a novel approach to few-shot action recognition, finding temp...
research
03/28/2023

Rethinking matching-based few-shot action recognition

Few-shot action recognition, i.e. recognizing new action classes given o...
research
02/04/2016

Joint Recognition and Segmentation of Actions via Probabilistic Integration of Spatio-Temporal Fisher Vectors

We propose a hierarchical approach to multi-action recognition that perf...
research
03/04/2019

Collaborative Spatio-temporal Feature Learning for Video Action Recognition

Spatio-temporal feature learning is of central importance for action rec...
research
11/16/2021

SequentialPointNet: A strong parallelized point cloud sequence network for 3D action recognition

Point cloud sequences of 3D human actions exhibit unordered intra-frame ...
research
08/18/2023

Boosting Few-shot Action Recognition with Graph-guided Hybrid Matching

Class prototype construction and matching are core aspects of few-shot a...
research
05/29/2019

Hierarchical Feature Aggregation Networks for Video Action Recognition

Most action recognition methods base on a) a late aggregation of frame l...

Please sign up or login with your details

Forgot password? Click here to reset