DeepAI AI Chat
Log In Sign Up

Joint Visual-Temporal Embedding for Unsupervised Learning of Actions in Untrimmed Sequences

by   Rosaura G. VidalMata, et al.

Understanding the structure of complex activities in videos is one of the many challenges faced by action recognition methods. To overcome this challenge, not only do methods need a solid knowledge of the visual structure of underlying features but also a good interpretation of how they could change over time. Consequently, action segmentation tasks must take into account not only the visual cues from individual frames, but their characteristics as a temporal sequence of features. This work presents our findings on the impact of incorporating both visual and temporal learning on an unsupervised action segmentation pipeline. We introduce a novel approach to extract relevant visual and temporal features from untrimmed sequences for the temporal localization of sub-activities within complex actions without any labeling information. Through extensive experimentation on two benchmark datasets – Breakfast Actions, and YouTube Instructions – we show that the proposed approach is able to provide a meaningful visual and temporal embedding from the visual cues from contiguous video frames and that it indeed helps in temporal segmentation.


page 1

page 3

page 8


Unsupervised Learning and Segmentation of Complex Activities from Video

This paper presents a new method for unsupervised segmentation of comple...

Unsupervised Discriminative Embedding for Sub-Action Learning in Complex Activities

Action recognition and detection in the context of long untrimmed video ...

Unsupervised learning of action classes with continuous temporal embedding

The task of temporally detecting and segmenting actions in untrimmed vid...

TAEC: Unsupervised Action Segmentation with Temporal-Aware Embedding and Clustering

Temporal action segmentation in untrimmed videos has gained increased at...

Unsupervised Visual-Linguistic Reference Resolution in Instructional Videos

We propose an unsupervised method for reference resolution in instructio...

Unsupervised Activity Segmentation by Joint Representation Learning and Online Clustering

We present a novel approach for unsupervised activity segmentation, whic...

Sequence-to-Sequence Modeling for Action Identification at High Temporal Resolution

Automatic action identification from video and kinematic data is an impo...