Scene Representation Transformer: Geometry-Free Novel View Synthesis Through Set-Latent Scene Representations

11/25/2021
by   Mehdi S. M. Sajjadi, et al.
0

A classical problem in computer vision is to infer a 3D scene representation from few images that can be used to render novel views at interactive rates. Previous work focuses on reconstructing pre-defined 3D representations, e.g. textured meshes, or implicit representations, e.g. radiance fields, and often requires input images with precise camera poses and long processing times for each novel scene. In this work, we propose the Scene Representation Transformer (SRT), a method which processes posed or unposed RGB images of a new area, infers a "set-latent scene representation", and synthesises novel views, all in a single feed-forward pass. To calculate the scene representation, we propose a generalization of the Vision Transformer to sets of images, enabling global information integration, and hence 3D reasoning. An efficient decoder transformer parameterizes the light field by attending into the scene representation to render novel views. Learning is supervised end-to-end by minimizing a novel-view reconstruction error. We show that this method outperforms recent baselines in terms of PSNR and speed on synthetic datasets, including a new dataset created for the paper. Further, we demonstrate that SRT scales to support interactive visualization and semantic segmentation of real-world outdoor environments using Street View imagery.

READ FULL TEXT

page 6

page 7

page 8

page 12

page 13

page 14

page 15

research
10/04/2022

Self-improving Multiplane-to-layer Images for Novel View Synthesis

We present a new method for lightweight novel-view synthesis that genera...
research
06/22/2015

DeepStereo: Learning to Predict New Views from the World's Imagery

Deep networks have recently enjoyed enormous success when applied to rec...
research
11/19/2014

Visual Noise from Natural Scene Statistics Reveals Human Scene Category Representations

Our perceptions are guided both by the bottom-up information entering ou...
research
09/11/2023

PAg-NeRF: Towards fast and efficient end-to-end panoptic 3D representations for agricultural robotics

Precise scene understanding is key for most robot monitoring and interve...
research
09/23/2022

PNeRF: Probabilistic Neural Scene Representations for Uncertain 3D Visual Mapping

Recently neural scene representations have provided very impressive resu...
research
12/05/2022

3D-LatentMapper: View Agnostic Single-View Reconstruction of 3D Shapes

Computer graphics, 3D computer vision and robotics communities have prod...
research
03/06/2023

Nerflets: Local Radiance Fields for Efficient Structure-Aware 3D Scene Representation from 2D Supervision

We address efficient and structure-aware 3D scene representation from im...

Please sign up or login with your details

Forgot password? Click here to reset