Joint Visual-Temporal Embedding for Unsupervised Learning of Actions in Untrimmed Sequences

01/29/2020
by   Rosaura G. VidalMata, et al.
17

Understanding the structure of complex activities in videos is one of the many challenges faced by action recognition methods. To overcome this challenge, not only do methods need a solid knowledge of the visual structure of underlying features but also a good interpretation of how they could change over time. Consequently, action segmentation tasks must take into account not only the visual cues from individual frames, but their characteristics as a temporal sequence of features. This work presents our findings on the impact of incorporating both visual and temporal learning on an unsupervised action segmentation pipeline. We introduce a novel approach to extract relevant visual and temporal features from untrimmed sequences for the temporal localization of sub-activities within complex actions without any labeling information. Through extensive experimentation on two benchmark datasets – Breakfast Actions, and YouTube Instructions – we show that the proposed approach is able to provide a meaningful visual and temporal embedding from the visual cues from contiguous video frames and that it indeed helps in temporal segmentation.

READ FULL TEXT

page 1

page 3

page 8

research
03/26/2018

Unsupervised Learning and Segmentation of Complex Activities from Video

This paper presents a new method for unsupervised segmentation of comple...
research
04/30/2021

Unsupervised Discriminative Embedding for Sub-Action Learning in Complex Activities

Action recognition and detection in the context of long untrimmed video ...
research
04/08/2019

Unsupervised learning of action classes with continuous temporal embedding

The task of temporally detecting and segmenting actions in untrimmed vid...
research
03/09/2023

TAEC: Unsupervised Action Segmentation with Temporal-Aware Embedding and Clustering

Temporal action segmentation in untrimmed videos has gained increased at...
research
03/07/2017

Unsupervised Visual-Linguistic Reference Resolution in Instructional Videos

We propose an unsupervised method for reference resolution in instructio...
research
05/27/2021

Unsupervised Activity Segmentation by Joint Representation Learning and Online Clustering

We present a novel approach for unsupervised activity segmentation, whic...
research
11/03/2021

Sequence-to-Sequence Modeling for Action Identification at High Temporal Resolution

Automatic action identification from video and kinematic data is an impo...

Please sign up or login with your details

Forgot password? Click here to reset