Learning to Segment Actions from Observation and Narration

05/07/2020
by   Daniel Fried, et al.
4

We apply a generative segmental model of task structure, guided by narration, to action segmentation in video. We focus on unsupervised and weakly-supervised settings where no action labels are known during training. Despite its simplicity, our model performs competitively with previous work on a dataset of naturalistic instructional videos. Our model allows us to vary the sources of supervision used in training, and we find that both task structure and narrative language provide large benefits in segmentation quality.

READ FULL TEXT

page 3

page 17

page 18

page 19

page 20

research
06/02/2017

Temporal Action Labeling using Action Sets

Action detection and temporal segmentation of actions in videos are topi...
research
11/11/2021

Dense Unsupervised Learning for Video Segmentation

We present a novel approach to unsupervised learning for video object se...
research
05/19/2020

On Evaluating Weakly Supervised Action Segmentation Methods

Action segmentation is the task of temporally segmenting every frame of ...
research
05/06/2019

Spatio-Temporal Action Localization in a Weakly Supervised Setting

Enabling computational systems with the ability to localize actions in v...
research
05/04/2022

P3IV: Probabilistic Procedure Planning from Instructional Videos with Weak Supervision

In this paper, we study the problem of procedure planning in instruction...
research
01/14/2022

Transformers in Action: Weakly Supervised Action Segmentation

The video action segmentation task is regularly explored under weaker fo...
research
01/09/2019

D^3TW: Discriminative Differentiable Dynamic Time Warping for Weakly Supervised Action Alignment and Segmentation

We address weakly-supervised action alignment and segmentation in videos...

Please sign up or login with your details

Forgot password? Click here to reset