Dynamic Spatio-Temporal Specialization Learning for Fine-Grained Action Recognition

09/03/2022
by   Tianjiao Li, et al.
8

The goal of fine-grained action recognition is to successfully discriminate between action categories with subtle differences. To tackle this, we derive inspiration from the human visual system which contains specialized regions in the brain that are dedicated towards handling specific tasks. We design a novel Dynamic Spatio-Temporal Specialization (DSTS) module, which consists of specialized neurons that are only activated for a subset of samples that are highly similar. During training, the loss forces the specialized neurons to learn discriminative fine-grained differences to distinguish between these similar samples, improving fine-grained recognition. Moreover, a spatio-temporal specialization method further optimizes the architectures of the specialized neurons to capture either more spatial or temporal fine-grained information, to better tackle the large range of spatio-temporal variations in the videos. Lastly, we design an Upstream-Downstream Learning algorithm to optimize our model's dynamic decisions during training, improving the performance of our DSTS module. We obtain state-of-the-art performance on two widely-used fine-grained action recognition datasets.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/15/2021

Few-Shot Fine-Grained Action Recognition via Bidirectional Attention and Contrastive Meta-Learning

Fine-grained action recognition is attracting increasing attention due t...
research
07/20/2022

ERA: Expert Retrieval and Assembly for Early Action Prediction

Early action prediction aims to successfully predict the class label of ...
research
08/23/2023

Multi-stage Factorized Spatio-Temporal Representation for RGB-D Action and Gesture Recognition

RGB-D action and gesture recognition remain an interesting topic in huma...
research
06/12/2019

Presence-Only Geographical Priors for Fine-Grained Image Classification

Appearance information alone is often not sufficient to accurately diffe...
research
08/03/2022

Combined CNN Transformer Encoder for Enhanced Fine-grained Human Action Recognition

Fine-grained action recognition is a challenging task in computer vision...
research
02/24/2022

Slow-Fast Visual Tempo Learning for Video-based Action Recognition

Action visual tempo characterizes the dynamics and the temporal scale of...
research
08/20/2019

Action recognition with spatial-temporal discriminative filter banks

Action recognition has seen a dramatic performance improvement in the la...

Please sign up or login with your details

Forgot password? Click here to reset