Depth Guided Adaptive Meta-Fusion Network for Few-shot Video Recognition

10/20/2020
by   Yuqian Fu, et al.
8

Humans can easily recognize actions with only a few examples given, while the existing video recognition models still heavily rely on the large-scale labeled data inputs. This observation has motivated an increasing interest in few-shot video action recognition, which aims at learning new actions with only very few labeled samples. In this paper, we propose a depth guided Adaptive Meta-Fusion Network for few-shot video recognition which is termed as AMeFu-Net. Concretely, we tackle the few-shot recognition problem from three aspects: firstly, we alleviate this extremely data-scarce problem by introducing depth information as a carrier of the scene, which will bring extra visual information to our model; secondly, we fuse the representation of original RGB clips with multiple non-strictly corresponding depth clips sampled by our temporal asynchronization augmentation mechanism, which synthesizes new instances at feature-level; thirdly, a novel Depth Guided Adaptive Instance Normalization (DGAdaIN) fusion module is proposed to fuse the two-stream modalities efficiently. Additionally, to better mimic the few-shot recognition process, our model is trained in the meta-learning way. Extensive experiments on several action recognition benchmarks demonstrate the effectiveness of our model.

READ FULL TEXT

page 1

page 4

page 8

research
09/30/2021

Unsupervised Few-Shot Action Recognition via Action-Appearance Aligned Meta-Adaptation

We present MetaUVFS as the first Unsupervised Meta-learning algorithm fo...
research
09/07/2022

Not All Instances Contribute Equally: Instance-adaptive Class Representation Learning for Few-Shot Visual Recognition

Few-shot visual recognition refers to recognize novel visual concepts fr...
research
08/19/2022

Part-aware Prototypical Graph Network for One-shot Skeleton-based Action Recognition

In this paper, we study the problem of one-shot skeleton-based action re...
research
08/03/2023

Multimodal Adaptation of CLIP for Few-Shot Action Recognition

Applying large-scale pre-trained visual models like CLIP to few-shot act...
research
09/09/2022

One-Shot Open-Set Skeleton-Based Action Recognition

Action recognition is a fundamental capability for humanoid robots to in...
research
03/06/2023

CLIP-guided Prototype Modulating for Few-shot Action Recognition

Learning from large-scale contrastive language-image pre-training like C...
research
08/10/2023

Ensemble Modeling for Multimodal Visual Action Recognition

In this work, we propose an ensemble modeling approach for multimodal ac...

Please sign up or login with your details

Forgot password? Click here to reset