A Generalized Robust Framework For Timestamp Supervision in Temporal Action Segmentation

07/20/2022
by   Rahul Rahaman, et al.
0

In temporal action segmentation, Timestamp supervision requires only a handful of labelled frames per video sequence. For unlabelled frames, previous works rely on assigning hard labels, and performance rapidly collapses under subtle violations of the annotation assumptions. We propose a novel Expectation-Maximization (EM) based approach that leverages the label uncertainty of unlabelled frames and is robust enough to accommodate possible annotation errors. With accurate timestamp annotations, our proposed method produces SOTA results and even exceeds the fully-supervised setup in several metrics and datasets. When applied to timestamp annotations with missing action segments, our method presents stable performance. To further test our formulation's robustness, we introduce the new challenging annotation setup of Skip-tag supervision. This setup relaxes constraints and requires annotations of any fixed number of random frames in a video, making it more flexible than Timestamp supervision while remaining competitive.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/12/2022

Robust Action Segmentation from Timestamp Supervision

Action segmentation is the task of predicting an action label for each f...
research
03/11/2021

Temporal Action Segmentation from Timestamp Supervision

Temporal action segmentation approaches have been very successful recent...
research
03/15/2020

SF-Net: Single-Frame Supervision for Temporal Action Localization

In this paper, we study an intermediate form of supervision, i.e., singl...
research
12/22/2022

Timestamp-Supervised Action Segmentation in the Perspective of Clustering

Video action segmentation aims to slice the video into several action se...
research
06/29/2018

A flexible model for training action localization with varying levels of supervision

Spatio-temporal action detection in videos is typically addressed in a f...
research
07/28/2017

Localizing Actions from Video Labels and Pseudo-Annotations

The goal of this paper is to determine the spatio-temporal location of a...
research
05/12/2022

Weakly-Supervised Action Detection Guided by Audio Narration

Videos are more well-organized curated data sources for visual concept l...

Please sign up or login with your details

Forgot password? Click here to reset