Probabilistic Human Mesh Recovery in 3D Scenes from Egocentric Views

by   Siwei Zhang, et al.

Automatic perception of human behaviors during social interactions is crucial for AR/VR applications, and an essential component is estimation of plausible 3D human pose and shape of our social partners from the egocentric view. One of the biggest challenges of this task is severe body truncation due to close social distances in egocentric scenarios, which brings large pose ambiguities for unseen body parts. To tackle this challenge, we propose a novel scene-conditioned diffusion method to model the body pose distribution. Conditioned on the 3D scene geometry, the diffusion model generates bodies in plausible human-scene interactions, with the sampling guided by a physics-based collision score to further resolve human-scene inter-penetrations. The classifier-free training enables flexible sampling with different conditions and enhanced diversity. A visibility-aware graph convolution model guided by per-joint visibility serves as the diffusion denoiser to incorporate inter-joint dependencies and per-body-part control. Extensive evaluations show that our method generates bodies in plausible interactions with 3D scenes, achieving both superior accuracy for visible joints and diversity for invisible body parts. The code will be available at


page 1

page 4

page 7

page 8

page 14

page 15


EgoBody: Human Body Shape, Motion and Social Interactions from Head-Mounted Devices

Understanding social interactions from first-person views is crucial for...

Single Image Human Proxemics Estimation for Visual Social Distancing

In this work, we address the problem of estimating the so-called "Social...

Generating 3D People in Scenes without People

We present a fully-automatic system that takes a 3D scene and generates ...

HuManiFlow: Ancestor-Conditioned Normalising Flows on SO(3) Manifolds for Human Pose and Shape Distribution Estimation

Monocular 3D human pose and shape estimation is an ill-posed problem sin...

HULC: 3D Human Motion Capture with Pose Manifold Sampling and Dense Contact Guidance

Marker-less monocular 3D human motion capture (MoCap) with scene interac...

Pipeline for 3D reconstruction of the human body from AR/VR headset mounted egocentric cameras

In this paper, we propose a novel pipeline for the 3D reconstruction of ...

Visually plausible human-object interaction capture from wearable sensors

In everyday lives, humans naturally modify the surrounding environment t...

Please sign up or login with your details

Forgot password? Click here to reset