Localizing the Common Action Among a Few Videos

08/13/2020
by   Pengwan Yang, et al.
18

This paper strives to localize the temporal extent of an action in a long untrimmed video. Where existing work leverages many examples with their start, their ending, and/or the class of the action during training time, we propose few-shot common action localization. The start and end of an action in a long untrimmed video is determined based on just a hand-full of trimmed video examples containing the same action, without knowing their common class label. To address this task, we introduce a new 3D convolutional network architecture able to align representations from the support videos with the relevant query video segments. The network contains: (i) a mutual enhancement module to simultaneously complement the representation of the few trimmed support videos and the untrimmed query video; (ii) a progressive alignment module that iteratively fuses the support videos into the query branch; and (iii) a pairwise matching module to weigh the importance of different support videos. Evaluation of few-shot common action localization in untrimmed videos containing a single or multiple action instances demonstrates the effectiveness and general applicability of our proposal.

READ FULL TEXT
research
04/06/2021

Few-Shot Transformation of Common Actions into Time and Space

This paper introduces the task of few-shot common action localization in...
research
01/09/2016

Temporal Action Localization in Untrimmed Videos via Multi-stage CNNs

We address temporal action localization in untrimmed long videos. This i...
research
10/20/2021

Few-Shot Temporal Action Localization with Query Adaptive Transformer

Existing temporal action localization (TAL) works rely on a large number...
research
06/08/2021

Few-Shot Action Localization without Knowing Boundaries

Learning to localize actions in long, cluttered, and untrimmed videos is...
research
07/12/2022

Compound Prototype Matching for Few-shot Action Recognition

Few-shot action recognition aims to recognize novel action classes using...
research
10/15/2022

Semantic Video Moments Retrieval at Scale: A New Task and a Baseline

Motivated by the increasing need of saving search effort by obtaining re...
research
11/14/2019

CMSN: Continuous Multi-stage Network and Variable Margin Cosine Loss for Temporal Action Proposal Generation

Accurately locating the start and end time of an action in untrimmed vid...

Please sign up or login with your details

Forgot password? Click here to reset