Few-shot Action Recognition via Intra- and Inter-Video Information Maximization

05/10/2023
by   Huabin Liu, et al.
0

Current few-shot action recognition involves two primary sources of information for classification:(1) intra-video information, determined by frame content within a single video clip, and (2) inter-video information, measured by relationships (e.g., feature similarity) among videos. However, existing methods inadequately exploit these two information sources. In terms of intra-video information, current sampling operations for input videos may omit critical action information, reducing the utilization efficiency of video data. For the inter-video information, the action misalignment among videos makes it challenging to calculate precise relationships. Moreover, how to jointly consider both inter- and intra-video information remains under-explored for few-shot action recognition. To this end, we propose a novel framework, Video Information Maximization (VIM), for few-shot video action recognition. VIM is equipped with an adaptive spatial-temporal video sampler and a spatiotemporal action alignment model to maximize intra- and inter-video information, respectively. The video sampler adaptively selects important frames and amplifies critical spatial regions for each input video based on the task at hand. This preserves and emphasizes informative parts of video clips while eliminating interference at the data level. The alignment model performs temporal and spatial action alignment sequentially at the feature level, leading to more precise measurements of inter-video similarity. Finally, These goals are facilitated by incorporating additional loss terms based on mutual information measurement. Consequently, VIM acts to maximize the distinctiveness of video information from limited video data. Extensive experimental results on public datasets for few-shot action recognition demonstrate the effectiveness and benefits of our framework.

READ FULL TEXT

page 2

page 17

page 22

page 23

research
07/20/2022

Task-adaptive Spatial-Temporal Video Sampler for Few-shot Action Recognition

A primary challenge faced in few-shot action recognition is inadequate v...
research
04/21/2021

Skimming and Scanning for Untrimmed Video Action Recognition

Video action recognition (VAR) is a primary task of video understanding,...
research
10/13/2020

Few-shot Action Recognition with Implicit Temporal Alignment and Pair Similarity Optimization

Few-shot learning aims to recognize instances from novel classes with fe...
research
03/05/2023

MITFAS: Mutual Information based Temporal Feature Alignment and Sampling for Aerial Video Action Recognition

We present a novel approach for action recognition in UAV videos. Our fo...
research
09/30/2022

Alignment-guided Temporal Attention for Video Action Recognition

Temporal modeling is crucial for various video learning tasks. Most rece...
research
07/26/2021

Temporal Alignment Prediction for Few-Shot Video Classification

The goal of few-shot video classification is to learn a classification m...
research
07/24/2021

Self-Conditioned Probabilistic Learning of Video Rescaling

Bicubic downscaling is a prevalent technique used to reduce the video st...

Please sign up or login with your details

Forgot password? Click here to reset