Weakly Supervised Energy-Based Learning for Action Segmentation

09/28/2019
by   Jun Li, et al.
11

This paper is about labeling video frames with action classes under weak supervision in training, where we have access to a temporal ordering of actions, but their start and end frames in training videos are unknown. Following prior work, we use an HMM grounded on a Gated Recurrent Unit (GRU) for frame labeling. Our key contribution is a new constrained discriminative forward loss (CDFL) that we use for training the HMM and GRU under weak supervision. While prior work typically estimates the loss on a single, inferred video segmentation, our CDFL discriminates between the energy of all valid and invalid frame labelings of a training video. A valid frame labeling satisfies the ground-truth temporal ordering of actions, whereas an invalid one violates the ground truth. We specify an efficient recursive algorithm for computing the CDFL in terms of the logadd function of the segmentation energy. Our evaluation on action segmentation and alignment gives superior results to those of the state of the art on the benchmark Breakfast Action, Hollywood Extended, and 50Salads datasets.

READ FULL TEXT
research
02/27/2020

Set-Constrained Viterbi for Set-Supervised Action Segmentation

This paper is about weakly supervised action segmentation, where ground ...
research
04/05/2021

Action Shuffle Alternating Learning for Unsupervised Action Segmentation

This paper addresses unsupervised action segmentation. Prior work captur...
research
04/05/2021

Anchor-Constrained Viterbi for Set-Supervised Action Segmentation

This paper is about action segmentation under weak supervision in traini...
research
10/19/2022

Temporal Action Segmentation: An Analysis of Modern Technique

Temporal action segmentation from videos aims at the dense labeling of v...
research
10/12/2022

Robust Action Segmentation from Timestamp Supervision

Action segmentation is the task of predicting an action label for each f...
research
08/24/2022

Weakly Supervised Airway Orifice Segmentation in Video Bronchoscopy

Video bronchoscopy is routinely conducted for biopsies of lung tissue su...
research
01/09/2019

D^3TW: Discriminative Differentiable Dynamic Time Warping for Weakly Supervised Action Alignment and Segmentation

We address weakly-supervised action alignment and segmentation in videos...

Please sign up or login with your details

Forgot password? Click here to reset