Monocular Real-Time Volumetric Performance Capture

by   Ruilong Li, et al.

We present the first approach to volumetric performance capture and novel-view rendering at real-time speed from monocular video, eliminating the need for expensive multi-view systems or cumbersome pre-acquisition of a personalized template model. Our system reconstructs a fully textured 3D human from each frame by leveraging Pixel-Aligned Implicit Function (PIFu). While PIFu achieves high-resolution reconstruction in a memory-efficient manner, its computationally expensive inference prevents us from deploying such a system for real-time applications. To this end, we propose a novel hierarchical surface localization algorithm and a direct rendering method without explicitly extracting surface meshes. By culling unnecessary regions for evaluation in a coarse-to-fine manner, we successfully accelerate the reconstruction by two orders of magnitude from the baseline without compromising the quality. Furthermore, we introduce an Online Hard Example Mining (OHEM) technique that effectively suppresses failure modes due to the rare occurrence of challenging examples. We adaptively update the sampling probability of the training data based on the current reconstruction accuracy, which effectively alleviates reconstruction artifacts. Our experiments and evaluations demonstrate the robustness of our system to various challenging angles, illuminations, poses, and clothing styles. We also show that our approach compares favorably with the state-of-the-art monocular performance capture. Our proposed approach removes the need for multi-view studio settings and enables a consumer-accessible solution for volumetric capture.


page 13

page 24

page 25

page 26

page 27

page 28


DeepCap: Monocular Human Performance Capture Using Weak Supervision

Human performance capture is a highly important computer vision problem ...

Learning Neural Radiance Fields from Multi-View Geometry

We present a framework, called MVG-NeRF, that combines classical Multi-V...

Polarimetric Multi-View Inverse Rendering

A polarization camera has great potential for 3D reconstruction since th...

DeepMultiCap: Performance Capture of Multiple Characters Using Sparse Multiview Cameras

We propose DeepMultiCap, a novel method for multi-person performance cap...

Function4D: Real-time Human Volumetric Capture from Very Sparse Consumer RGBD Sensors

Human volumetric capture is a long-standing topic in computer vision and...

PVA: Pixel-aligned Volumetric Avatars

Acquisition and rendering of photo-realistic human heads is a highly cha...

Optimisations for Real-Time Volumetric Cloudscapes

Volumetric cloudscapes are prohibitively expensive to render in real tim...

Please sign up or login with your details

Forgot password? Click here to reset