SF-Net: Single-Frame Supervision for Temporal Action Localization

03/15/2020
by   Fan Ma, et al.
7

In this paper, we study an intermediate form of supervision, i.e., single-frame supervision, for temporal action localization (TAL). To obtain the single-frame supervision, the annotators are asked to identify only a single frame within the temporal window of an action. This can significantly reduce the labor cost of obtaining full supervision which requires annotating the action boundary. Compared to the weak supervision that only annotates the video-level label, the single-frame supervision introduces extra temporal action signals while maintaining low annotation overhead. To make full use of such single-frame supervision, we propose a unified system called SF-Net. First, we propose to predict an actionness score for each video frame. Along with a typical category score, the actionness score can provide comprehensive information about the occurrence of a potential action and aid the temporal boundary refinement during inference. Second, we mine pseudo action and background frames based on the single-frame annotations. We identify pseudo action frames by adaptively expanding each annotated single frame to its nearby, contextual frames and we mine pseudo background frames from all the unannotated frames across multiple videos. Together with the ground-truth labeled frames, these pseudo-labeled frames are further used for training the classifier.

READ FULL TEXT
research
12/13/2022

Dilation-Erosion for Single-Frame Supervised Temporal Action Localization

To balance the annotation labor and the granularity of supervision, sing...
research
05/25/2023

Action Sensitivity Learning for Temporal Action Localization

Temporal action localization (TAL), which involves recognizing and locat...
research
03/30/2023

Unsupervised Word Segmentation Using Temporal Gradient Pseudo-Labels

Unsupervised word segmentation in audio utterances is challenging as, in...
research
05/24/2019

Implicit Label Augmentation on Partially Annotated Clips via Temporally-Adaptive Features Learning

Partially annotated clips contain rich temporal contexts that can comple...
research
10/12/2022

Robust Action Segmentation from Timestamp Supervision

Action segmentation is the task of predicting an action label for each f...
research
07/20/2022

A Generalized Robust Framework For Timestamp Supervision in Temporal Action Segmentation

In temporal action segmentation, Timestamp supervision requires only a h...
research
01/10/2019

Cricket stroke extraction: Towards creation of a large-scale cricket actions dataset

In this paper, we deal with the problem of temporal action localization ...

Please sign up or login with your details

Forgot password? Click here to reset