PoseKernelLifter: Metric Lifting of 3D Human Pose using Sound

by   Zhijian Yang, et al.

Reconstructing the 3D pose of a person in metric scale from a single view image is a geometrically ill-posed problem. For example, we can not measure the exact distance of a person to the camera from a single view image without additional scene assumptions (e.g., known height). Existing learning based approaches circumvent this issue by reconstructing the 3D pose up to scale. However, there are many applications such as virtual telepresence, robotics, and augmented reality that require metric scale reconstruction. In this paper, we show that audio signals recorded along with an image, provide complementary information to reconstruct the metric 3D pose of the person. The key insight is that as the audio signals traverse across the 3D space, their interactions with the body provide metric information about the body's pose. Based on this insight, we introduce a time-invariant transfer function called pose kernel – the impulse response of audio signals induced by the body pose. The main properties of the pose kernel are that (1) its envelope highly correlates with 3D pose, (2) the time response corresponds to arrival time, indicating the metric distance to the microphone, and (3) it is invariant to changes in the scene geometry configurations. Therefore, it is readily generalizable to unseen scenes. We design a multi-stage 3D CNN that fuses audio and visual signals and learns to reconstruct 3D pose in a metric scale. We show that our multi-modal method produces accurate metric reconstruction in real world scenes, which is not possible with state-of-the-art lifting approaches including parametric mesh regression and depth regression.


page 1

page 2

page 3

page 4

page 6

page 7

page 9

page 10


You2Me: Inferring Body Pose in Egocentric Video via First and Second Person Interactions

The body pose of a person wearing a camera is of great interest for appl...

Reconstructing 3D Human Pose by Watching Humans in the Mirror

In this paper, we introduce the new task of reconstructing 3D human pose...

Scene-aware Egocentric 3D Human Pose Estimation

Egocentric 3D human pose estimation with a single head-mounted fisheye c...

Hallucinating Pose-Compatible Scenes

What does human pose tell us about a scene? We propose a task to answer ...

Seeing Invisible Poses: Estimating 3D Body Pose from Egocentric Video

Understanding the camera wearer's activity is central to egocentric visi...

Scene-Aware 3D Multi-Human Motion Capture from a Single Camera

In this work, we consider the problem of estimating the 3D position of m...

Small Celestial Body Exploration with CubeSat Swarms

This work presents a large-scale simulation study investigating the depl...

Please sign up or login with your details

Forgot password? Click here to reset