Hierarchical Modeling for Task Recognition and Action Segmentation in Weakly-Labeled Instructional Videos

10/12/2021
by   Reza Ghoddoosian, et al.
0

This paper focuses on task recognition and action segmentation in weakly-labeled instructional videos, where only the ordered sequence of video-level actions is available during training. We propose a two-stream framework, which exploits semantic and temporal hierarchies to recognize top-level tasks in instructional videos. Further, we present a novel top-down weakly-supervised action segmentation approach, where the predicted task is used to constrain the inference of fine-grained action sequences. Experimental results on the popular Breakfast and Cooking 2 datasets show that our two-stream hierarchical task modeling significantly outperforms existing methods in top-level task recognition for all datasets and metrics. Additionally, using our task recognition framework in the proposed top-down action segmentation approach consistently improves the state of the art, while also reducing segmentation inference time by 80-90 percent.

READ FULL TEXT
research
11/20/2020

Action Duration Prediction for Segment-Level Alignment of Weakly-Labeled Videos

This paper focuses on weakly-supervised action alignment, where only the...
research
11/12/2020

Adding Knowledge to Unsupervised Algorithms for the Recognition of Intent

Computer vision algorithms performance are near or superior to humans in...
research
06/03/2019

A Hybrid RNN-HMM Approach for Weakly Supervised Temporal Action Segmentation

Action recognition has become a rapidly developing research field within...
research
03/23/2017

Weakly Supervised Action Learning with RNN based Fine-to-coarse Modeling

We present an approach for weakly supervised learning of human actions. ...
research
03/24/2022

Weakly-Supervised Online Action Segmentation in Multi-View Instructional Videos

This paper addresses a new problem of weakly-supervised online action se...
research
05/17/2018

NeuralNetwork-Viterbi: A Framework for Weakly Supervised Video Learning

Video learning is an important task in computer vision and has experienc...
research
03/09/2022

Part-level Action Parsing via a Pose-guided Coarse-to-Fine Framework

Action recognition from videos, i.e., classifying a video into one of th...

Please sign up or login with your details

Forgot password? Click here to reset