MC-JEPA: A Joint-Embedding Predictive Architecture for Self-Supervised Learning of Motion and Content Features

07/24/2023
by   Adrien Bardes, et al.
0

Self-supervised learning of visual representations has been focusing on learning content features, which do not capture object motion or location, and focus on identifying and differentiating objects in images and videos. On the other hand, optical flow estimation is a task that does not involve understanding the content of the images on which it is estimated. We unify the two approaches and introduce MC-JEPA, a joint-embedding predictive architecture and self-supervised learning approach to jointly learn optical flow and content features within a shared encoder, demonstrating that the two associated objectives; the optical flow estimation objective and the self-supervised learning objective; benefit from each other and thus learn content features that incorporate motion information. The proposed approach achieves performance on-par with existing unsupervised optical flow benchmarks, as well as with common self-supervised learning approaches on downstream tasks such as semantic segmentation of images and videos.

READ FULL TEXT

page 7

page 19

page 20

research
04/05/2020

Flow2Stereo: Effective Self-Supervised Learning of Optical Flow and Stereo Matching

In this paper, we propose a unified method to jointly learn optical flow...
research
07/25/2023

Optical Flow boosts Unsupervised Localization and Segmentation

Unsupervised localization and segmentation are long-standing robot visio...
research
01/16/2018

Reblur2Deblur: Deblurring Videos via Self-Supervised Learning

Motion blur is a fundamental problem in computer vision as it impacts im...
research
04/29/2021

MarioNette: Self-Supervised Sprite Learning

Visual content often contains recurring elements. Text is made up of gly...
research
01/19/2023

Self-Supervised Learning from Images with a Joint-Embedding Predictive Architecture

This paper demonstrates an approach for learning highly semantic image r...
research
03/27/2019

Self-Supervised Learning via Conditional Motion Propagation

Intelligent agent naturally learns from motion. Various self-supervised ...
research
10/27/2019

SENSE: a Shared Encoder Network for Scene-flow Estimation

We introduce a compact network for holistic scene flow estimation, calle...

Please sign up or login with your details

Forgot password? Click here to reset