SceneScape: Text-Driven Consistent Scene Generation

02/02/2023
by   Rafail Fridman, et al.
0

We propose a method for text-driven perpetual view generation – synthesizing long videos of arbitrary scenes solely from an input text describing the scene and camera poses. We introduce a novel framework that generates such videos in an online fashion by combining the generative power of a pre-trained text-to-image model with the geometric priors learned by a pre-trained monocular depth prediction model. To achieve 3D consistency, i.e., generating videos that depict geometrically-plausible scenes, we deploy an online test-time training to encourage the predicted depth map of the current frame to be geometrically consistent with the synthesized scene; the depth maps are used to construct a unified mesh representation of the scene, which is updated throughout the generation and is used for rendering. In contrast to previous works, which are applicable only for limited domains (e.g., landscapes), our framework generates diverse scenes, such as walkthroughs in spaceships, caves, or ice castles. Project page: https://scenescape.github.io/

READ FULL TEXT

page 1

page 3

page 5

page 6

page 7

research
05/19/2023

Text2NeRF: Text-Driven 3D Scene Generation with Neural Radiance Fields

Text-driven 3D scene generation is widely applicable to video gaming, fi...
research
11/22/2022

DiffDreamer: Consistent Single-view Perpetual View Generation with Conditional Diffusion Models

Perpetual view generation – the task of generating long-range novel view...
research
03/21/2023

Text2Room: Extracting Textured 3D Meshes from 2D Text-to-Image Models

We present Text2Room, a method for generating room-scale textured 3D mes...
research
03/23/2023

Persistent Nature: A Generative Model of Unbounded 3D Worlds

Despite increasingly realistic image quality, recent 3D image generative...
research
07/22/2022

InfiniteNature-Zero: Learning Perpetual View Generation of Natural Scenes from Single Images

We present a method for learning to generate unbounded flythrough videos...
research
03/09/2023

3D Video Loops from Asynchronous Input

Looping videos are short video clips that can be looped endlessly withou...
research
02/03/2020

DiverseDepth: Affine-invariant Depth Prediction Using Diverse Data

We present a method for depth estimation with monocular images, which ca...

Please sign up or login with your details

Forgot password? Click here to reset