Object-Centric Learning for Real-World Videos by Predicting Temporal Feature Similarities

06/07/2023
by   Andrii Zadaianchuk, et al.
0

Unsupervised video-based object-centric learning is a promising avenue to learn structured representations from large, unlabeled video collections, but previous approaches have only managed to scale to real-world datasets in restricted domains. Recently, it was shown that the reconstruction of pre-trained self-supervised features leads to object-centric representations on unconstrained real-world image datasets. Building on this approach, we propose a novel way to use such pre-trained features in the form of a temporal feature similarity loss. This loss encodes temporal correlations between image patches and is a natural way to introduce a motion bias for object discovery. We demonstrate that this loss leads to state-of-the-art performance on the challenging synthetic MOVi datasets. When used in combination with the feature reconstruction loss, our model is the first object-centric video model that scales to unconstrained video datasets such as YouTube-VIS.

READ FULL TEXT

page 4

page 6

page 18

page 23

page 24

page 25

page 26

page 27

research
05/18/2023

SlotDiffusion: Object-Centric Generative Modeling with Diffusion Models

Object-centric learning aims to represent visual data with a set of obje...
research
09/29/2022

Bridging the Gap to Real-World Object-Centric Learning

Humans naturally decompose their environment into entities at the approp...
research
04/20/2021

GENESIS-V2: Inferring Unordered Object Representations without Iterative Refinement

Advances in object-centric generative models (OCGMs) have culminated in ...
research
05/31/2023

Spotlight Attention: Robust Object-Centric Learning With a Spatial Locality Prior

The aim of object-centric vision is to construct an explicit representat...
research
06/01/2023

Rotating Features for Object Discovery

The binding problem in human cognition, concerning how the brain represe...
research
05/26/2022

Unsupervised Multi-object Segmentation Using Attention and Soft-argmax

We introduce a new architecture for unsupervised object-centric represen...
research
08/19/2023

Semantics Meets Temporal Correspondence: Self-supervised Object-centric Learning in Videos

Self-supervised methods have shown remarkable progress in learning high-...

Please sign up or login with your details

Forgot password? Click here to reset