DeepAI AI Chat
Log In Sign Up

Self-Supervised Equivariant Scene Synthesis from Video

by   Cinjon Resnick, et al.
NYU college

We propose a self-supervised framework to learn scene representations from video that are automatically delineated into background, characters, and their animations. Our method capitalizes on moving characters being equivariant with respect to their transformation across frames and the background being constant with respect to that same transformation. After training, we can manipulate image encodings in real time to create unseen combinations of the delineated components. As far as we know, we are the first method to perform unsupervised extraction and synthesis of interpretable background, character, and animation. We demonstrate results on three datasets: Moving MNIST with backgrounds, 2D video game sprites, and Fashion Modeling.


page 2

page 6

page 8

page 11

page 12

page 14

page 15


Learned Equivariant Rendering without Transformation Supervision

We propose a self-supervised framework to learn scene representations fr...

SketchBetween: Video-to-Video Synthesis for Sprite Animation via Sketches

2D animation is a common factor in game development, used for characters...

CeMNet: Self-supervised learning for accurate continuous ego-motion estimation

In this paper, we propose a novel self-supervised learning model for est...

NF-SAVO: Neuro-Fuzzy system for Arabic Video OCR

In this paper we propose a robust approach for text extraction and recog...

Self-Supervised Real-time Video Stabilization

Videos are a popular media form, where online video streaming has recent...

Self-Supervised Damage-Avoiding Manipulation Strategy Optimization via Mental Simulation

Everyday robotics are challenged to deal with autonomous product handlin...