Patch-based Object-centric Transformers for Efficient Video Generation

06/08/2022
by   Wilson Yan, et al.
0

In this work, we present Patch-based Object-centric Video Transformer (POVT), a novel region-based video generation architecture that leverages object-centric information to efficiently model temporal dynamics in videos. We build upon prior work in video prediction via an autoregressive transformer over the discrete latent space of compressed videos, with an added modification to model object-centric information via bounding boxes. Due to better compressibility of object-centric representations, we can improve training efficiency by allowing the model to only access object information for longer horizon temporal information. When evaluated on various difficult object-centric datasets, our method achieves better or equal performance to other video generation models, while remaining computationally more efficient and scalable. In addition, we show that our method is able to perform object-centric controllability through bounding box manipulation, which may aid downstream tasks such as video editing, or visual planning. Samples are available at https://sites.google.com/view/povt-publichttps://sites.google.com/view/povt-public

READ FULL TEXT

page 1

page 4

page 8

research
06/09/2023

DDLP: Unsupervised Object-Centric Video Prediction with Deep Dynamic Latent Particles

We propose a new object-centric video prediction algorithm based on the ...
research
10/13/2021

Object-Region Video Transformers

Evidence from cognitive psychology suggests that understanding spatio-te...
research
05/06/2021

Object-centric Video Prediction without Annotation

In order to interact with the world, agents must be able to predict the ...
research
05/25/2023

Concept-Centric Transformers: Concept Transformers with Object-Centric Concept Learning for Interpretability

Attention mechanisms have greatly improved the performance of deep-learn...
research
08/15/2023

Helping Hands: An Object-Aware Ego-Centric Video Recognition Model

We introduce an object-aware decoder for improving the performance of sp...
research
07/20/2022

Is an Object-Centric Video Representation Beneficial for Transfer?

The objective of this work is to learn an object-centric video represent...
research
04/01/2023

SVT: Supertoken Video Transformer for Efficient Video Understanding

Whether by processing videos with fixed resolution from start to end or ...

Please sign up or login with your details

Forgot password? Click here to reset