Self-Supervision by Prediction for Object Discovery in Videos

03/09/2021
by   Beril Besbinar, et al.
7

Despite their irresistible success, deep learning algorithms still heavily rely on annotated data. On the other hand, unsupervised settings pose many challenges, especially about determining the right inductive bias in diverse scenarios. One scalable solution is to make the model generate the supervision for itself by leveraging some part of the input data, which is known as self-supervised learning. In this paper, we use the prediction task as self-supervision and build a novel object-centric model for image sequence representation. In addition to disentangling the notion of objects and the motion dynamics, our compositional structure explicitly handles occlusion and inpaints inferred objects and background for the composition of the predicted frame. With the aid of auxiliary loss functions that promote spatially and temporally consistent object representations, our self-supervised framework can be trained without the help of any manual annotation or pretrained network. Initial experiments confirm that the proposed pipeline is a promising step towards object-centric video prediction.

READ FULL TEXT

page 3

page 7

page 8

research
10/26/2020

Refactoring Policy for Compositional Generalizability using Self-Supervised Object Proposals

We study how to learn a policy with compositional generalizability. We p...
research
06/07/2023

Coarse Is Better? A New Pipeline Towards Self-Supervised Learning with Uncurated Images

Most self-supervised learning (SSL) methods often work on curated datase...
research
03/11/2022

Towards Self-Supervised Learning of Global and Object-Centric Representations

Self-supervision allows learning meaningful representations of natural i...
research
04/04/2022

Object Permanence Emerges in a Random Walk along Memory

This paper proposes a self-supervised objective for learning representat...
research
05/03/2019

SCOPS: Self-Supervised Co-Part Segmentation

Parts provide a good intermediate representation of objects that is robu...
research
09/12/2020

Removing the Background by Adding the Background: Towards Background Robust Self-supervised Video Representation Learning

Self-supervised learning has shown great potentials in improving the vid...
research
05/23/2020

S3VAE: Self-Supervised Sequential VAE for Representation Disentanglement and Data Generation

We propose a sequential variational autoencoder to learn disentangled re...

Please sign up or login with your details

Forgot password? Click here to reset