Towards Visual Foundational Models of Physical Scenes

06/06/2023
by   Chethan Parameshwara, et al.
0

We describe a first step towards learning general-purpose visual representations of physical scenes using only image prediction as a training criterion. To do so, we first define "physical scene" and show that, even though different agents may maintain different representations of the same scene, the underlying physical scene that can be inferred is unique. Then, we show that NeRFs cannot represent the physical scene, as they lack extrapolation mechanisms. Those, however, could be provided by Diffusion Models, at least in theory. To test this hypothesis empirically, NeRFs can be combined with Diffusion Models, a process we refer to as NeRF Diffusion, used as unsupervised representations of the physical scene. Our analysis is limited to visual data, without external grounding mechanisms that can be provided by independent sensory modalities.

READ FULL TEXT

page 12

page 13

page 15

page 19

page 20

research
12/02/2022

Prediction of Scene Plausibility

Understanding the 3D world from 2D images involves more than detection a...
research
02/17/2017

Be Precise or Fuzzy: Learning the Meaning of Cardinals and Quantifiers from Vision

People can refer to quantities in a visual scene by using either exact c...
research
01/02/2023

Diffusion Probabilistic Models for Scene-Scale 3D Categorical Data

In this paper, we learn a diffusion model to generate 3D data on a scene...
research
12/10/2015

3D Reconstruction of Crime Scenes and Design Considerations for an Interactive Investigation Tool

Crime Scene Investigation (CSI) is a carefully planned systematic proces...
research
10/06/2019

Neural Multisensory Scene Inference

For embodied agents to infer representations of the underlying 3D physic...
research
06/09/2023

Beyond Surface Statistics: Scene Representations in a Latent Diffusion Model

Latent diffusion models (LDMs) exhibit an impressive ability to produce ...
research
05/29/2023

Generating Driving Scenes with Diffusion

In this paper we describe a learned method of traffic scene generation d...

Please sign up or login with your details

Forgot password? Click here to reset