Learning Temporal Dynamics from Cycles in Narrated Video

01/07/2021
by   Dave Epstein, et al.
2

Learning to model how the world changes as time elapses has proven a challenging problem for the computer vision community. We propose a self-supervised solution to this problem using temporal cycle consistency jointly in vision and language, training on narrated video. Our model learns modality-agnostic functions to predict forward and backward in time, which must undo each other when composed. This constraint leads to the discovery of high-level transitions between moments in time, since such transitions are easily inverted and shared across modalities. We justify the design of our model with an ablation study on different configurations of the cycle consistency problem. We then show qualitatively and quantitatively that our approach yields a meaningful, high-level model of the future and past. We apply the learned dynamics model without further training to various tasks, such as predicting future action and temporally ordering sets of images.

READ FULL TEXT

page 1

page 2

page 6

page 7

page 8

page 9

research
10/14/2020

Back to the Future: Cycle Encoding Prediction for Self-supervised Contrastive Video Representation Learning

In this paper we show that learning video feature spaces in which tempor...
research
11/24/2021

Learning State Representations via Retracing in Reinforcement Learning

We propose learning via retracing, a novel self-supervised approach for ...
research
04/16/2019

Temporal Cycle-Consistency Learning

We introduce a self-supervised representation learning method based on t...
research
03/18/2019

Learning Correspondence from the Cycle-Consistency of Time

We introduce a self-supervised method for learning visual correspondence...
research
06/27/2023

Semi-supervised Multimodal Representation Learning through a Global Workspace

Recent deep learning models can efficiently combine inputs from differen...
research
08/08/2023

Temporal DINO: A Self-supervised Video Strategy to Enhance Action Prediction

The emerging field of action prediction plays a vital role in various co...
research
06/08/2018

Temporal Difference Variational Auto-Encoder

One motivation for learning generative models of environments is to use ...

Please sign up or login with your details

Forgot password? Click here to reset