Neural Volumes: Learning Dynamic Renderable Volumes from Images

by   Stephen Lombardi, et al.

Modeling and rendering of dynamic scenes is challenging, as natural scenes often contain complex phenomena such as thin structures, evolving topology, translucency, scattering, occlusion, and biological motion. Mesh-based reconstruction and tracking often fail in these cases, and other approaches (e.g., light field video) typically rely on constrained viewing conditions, which limit interactivity. We circumvent these difficulties by presenting a learning-based approach to representing dynamic objects inspired by the integral projection model used in tomographic imaging. The approach is supervised directly from 2D images in a multi-view capture setting and does not require explicit reconstruction or tracking of the object. Our method has two primary components: an encoder-decoder network that transforms input images into a 3D volume representation, and a differentiable ray-marching operation that enables end-to-end training. By virtue of its 3D representation, our construction extrapolates better to novel viewpoints compared to screen-space rendering techniques. The encoder-decoder architecture learns a latent representation of a dynamic scene that enables us to produce novel content sequences not seen during training. To overcome memory limitations of voxel-based representations, we learn a dynamic irregular grid structure implemented with a warp field during ray-marching. This structure greatly improves the apparent resolution and reduces grid-like artifacts and jagged motion. Finally, we demonstrate how to incorporate surface-based representations into our volumetric-learning framework for applications where the highest resolution is required, using facial performance capture as a case in point.


page 1

page 3

page 9

page 10

page 11

page 12

page 13


Deep Reflectance Volumes: Relightable Reconstructions from Multi-View Photometric Images

We present a deep learning approach to reconstruct scene appearance from...

Light Field Networks: Neural Scene Representations with Single-Evaluation Rendering

Inferring representations of 3D scenes from 2D observations is a fundame...

HyperReel: High-Fidelity 6-DoF Video with Ray-Conditioned Sampling

Volumetric scene representations enable photorealistic view synthesis fo...

Zip-NeRF: Anti-Aliased Grid-Based Neural Radiance Fields

Neural Radiance Field training can be accelerated through the use of gri...

Fristograms: Revealing and Exploiting Light Field Internals

In recent years, light field (LF) capture and processing has become an i...

Nanomatrix: Scalable Construction of Crowded Biological Environments

We present a novel method for interactive construction and rendering of ...

RayNet: Learning Volumetric 3D Reconstruction with Ray Potentials

In this paper, we consider the problem of reconstructing a dense 3D mode...

Please sign up or login with your details

Forgot password? Click here to reset