Self-Supervised Video Representation Learning via Latent Time Navigation

05/10/2023
by   Di Yang, et al.
0

Self-supervised video representation learning aimed at maximizing similarity between different temporal segments of one video, in order to enforce feature persistence over time. This leads to loss of pertinent information related to temporal relationships, rendering actions such as `enter' and `leave' to be indistinguishable. To mitigate this limitation, we propose Latent Time Navigation (LTN), a time-parameterized contrastive learning strategy that is streamlined to capture fine-grained motions. Specifically, we maximize the representation similarity between different video segments from one video, while maintaining their representations time-aware along a subspace of the latent representation code including an orthogonal basis to represent temporal changes. Our extensive experimental analysis suggests that learning video representations by LTN consistently improves performance of action classification in fine-grained and human-oriented tasks (e.g., on Toyota Smarthome dataset). In addition, we demonstrate that our proposed model, when pre-trained on Kinetics-400, generalizes well onto the unseen real world video benchmark datasets UCF101 and HMDB51, achieving state-of-the-art performance in action recognition.

READ FULL TEXT
research
04/01/2021

Composable Augmentation Encoding for Video Representation Learning

We focus on contrastive methods for self-supervised video representation...
research
08/13/2020

Self-supervised Video Representation Learning by Pace Prediction

This paper addresses the problem of self-supervised video representation...
research
04/22/2023

NaviNeRF: NeRF-based 3D Representation Disentanglement by Latent Semantic Navigation

3D representation disentanglement aims to identify, decompose, and manip...
research
07/08/2021

Video 3D Sampling for Self-supervised Representation Learning

Most of the existing video self-supervised methods mainly leverage tempo...
research
09/26/2021

Self-Supervised Video Representation Learning by Video Incoherence Detection

This paper introduces a novel self-supervised method that leverages inco...
research
11/17/2021

Learning to Align Sequential Actions in the Wild

State-of-the-art methods for self-supervised sequential action alignment...
research
06/24/2020

PredNet and Predictive Coding: A Critical Review

PredNet, a deep predictive coding network developed by Lotter et al., co...

Please sign up or login with your details

Forgot password? Click here to reset