Learning to Align Sequential Actions in the Wild

11/17/2021
by   Weizhe Liu, et al.
4

State-of-the-art methods for self-supervised sequential action alignment rely on deep networks that find correspondences across videos in time. They either learn frame-to-frame mapping across sequences, which does not leverage temporal information, or assume monotonic alignment between each video pair, which ignores variations in the order of actions. As such, these methods are not able to deal with common real-world scenarios that involve background frames or videos that contain non-monotonic sequence of actions. In this paper, we propose an approach to align sequential actions in the wild that involve diverse temporal variations. To this end, we propose an approach to enforce temporal priors on the optimal transport matrix, which leverages temporal consistency, while allowing for variations in the order of actions. Our model accounts for both monotonic and non-monotonic sequences and handles background frames that should not be aligned. We demonstrate that our approach consistently outperforms the state-of-the-art in self-supervised sequential action representation learning on four different benchmark datasets.

READ FULL TEXT

page 5

page 8

research
04/16/2019

Temporal Cycle-Consistency Learning

We introduce a self-supervised representation learning method based on t...
research
03/31/2021

Learning by Aligning Videos in Time

We present a self-supervised approach for learning video representations...
research
04/26/2022

Context-Aware Sequence Alignment using 4D Skeletal Augmentation

Temporal alignment of fine-grained human actions in videos is important ...
research
04/13/2023

Video alignment using unsupervised learning of local and global features

In this paper, we tackle the problem of video alignment, the process of ...
research
05/10/2023

Self-Supervised Video Representation Learning via Latent Time Navigation

Self-supervised video representation learning aimed at maximizing simila...
research
07/22/2022

My View is the Best View: Procedure Learning from Egocentric Videos

Procedure learning involves identifying the key-steps and determining th...
research
07/13/2021

Developmental Stage Classification of Embryos Using Two-Stream Neural Network with Linear-Chain Conditional Random Field

The developmental process of embryos follows a monotonic order. An embry...

Please sign up or login with your details

Forgot password? Click here to reset