Unsupervised Few-Shot Action Recognition via Action-Appearance Aligned Meta-Adaptation

09/30/2021
by   Jay Patravali, et al.
4

We present MetaUVFS as the first Unsupervised Meta-learning algorithm for Video Few-Shot action recognition. MetaUVFS leverages over 550K unlabeled videos to train a two-stream 2D and 3D CNN architecture via contrastive learning to capture the appearance-specific spatial and action-specific spatio-temporal video features respectively. MetaUVFS comprises a novel Action-Appearance Aligned Meta-adaptation (A3M) module that learns to focus on the action-oriented video features in relation to the appearance features via explicit few-shot episodic meta-learning over unsupervised hard-mined episodes. Our action-appearance alignment and explicit few-shot learner conditions the unsupervised training to mimic the downstream few-shot task, enabling MetaUVFS to significantly outperform all unsupervised methods on few-shot benchmarks. Moreover, unlike previous few-shot action recognition methods that are supervised, MetaUVFS needs neither base-class labels nor a supervised pretrained backbone. Thus, we need to train MetaUVFS just once to perform competitively or sometimes even outperform state-of-the-art supervised methods on popular HMDB51, UCF101, and Kinetics100 few-shot datasets.

READ FULL TEXT

page 1

page 4

research
07/21/2019

TARN: Temporal Attentive Relation Network for Few-Shot and Zero-Shot Action Recognition

In this paper we propose a novel Temporal Attentive Relation Network (TA...
research
10/20/2020

Depth Guided Adaptive Meta-Fusion Network for Few-shot Video Recognition

Humans can easily recognize actions with only a few examples given, whil...
research
11/17/2020

Semi-Supervised Few-Shot Atomic Action Recognition

Despite excellent progress has been made, the performance on action reco...
research
01/20/2021

Few-shot Action Recognition with Prototype-centered Attentive Learning

Few-shot action recognition aims to recognize action classes with few tr...
research
06/28/2020

Unsupervised Learning of Video Representations via Dense Trajectory Clustering

This paper addresses the task of unsupervised learning of representation...
research
10/24/2021

A Closer Look at Few-Shot Video Classification: A New Baseline and Benchmark

The existing few-shot video classification methods often employ a meta-l...
research
04/06/2022

Temporal Alignment Networks for Long-term Video

The objective of this paper is a temporal alignment network that ingests...

Please sign up or login with your details

Forgot password? Click here to reset