Realistic Full-Body Tracking from Sparse Observations via Joint-Level Modeling

by   Xiaozheng Zheng, et al.

To bridge the physical and virtual worlds for rapidly developed VR/AR applications, the ability to realistically drive 3D full-body avatars is of great significance. Although real-time body tracking with only the head-mounted displays (HMDs) and hand controllers is heavily under-constrained, a carefully designed end-to-end neural network is of great potential to solve the problem by learning from large-scale motion data. To this end, we propose a two-stage framework that can obtain accurate and smooth full-body motions with the three tracking signals of head and hands only. Our framework explicitly models the joint-level features in the first stage and utilizes them as spatiotemporal tokens for alternating spatial and temporal transformer blocks to capture joint-level correlations in the second stage. Furthermore, we design a set of loss terms to constrain the task of a high degree of freedom, such that we can exploit the potential of our joint-level modeling. With extensive experiments on the AMASS motion dataset and real-captured data, we validate the effectiveness of our designs and show our proposed method can achieve more accurate and smooth motion compared to existing approaches.


page 1

page 3

page 4

page 6

page 7

page 10


Avatars Grow Legs: Generating Smooth Human Motion from Sparse Tracking Inputs with Diffusion Model

With the recent surge in popularity of AR/VR applications, realistic and...

BoDiffusion: Diffusing Sparse Observations for Full-Body Human Motion Synthesis

Mixed reality applications require tracking the user's full-body motion ...

QuestSim: Human Motion Tracking from Sparse Sensors with Simulated Avatars

Real-time tracking of human body motion is crucial for interactive and i...

HMD-NeMo: Online 3D Avatar Motion Generation From Sparse Observations

Generating both plausible and accurate full body avatar motion is the ke...

LoBSTr: Real-time Lower-body Pose Prediction from Sparse Upper-body Tracking Signals

With the popularization of game and VR/AR devices, there is a growing ne...

Coordinate Transformer: Achieving Single-stage Multi-person Mesh Recovery from Videos

Multi-person 3D mesh recovery from videos is a critical first step towar...

Learning Variational Motion Prior for Video-based Motion Capture

Motion capture from a monocular video is fundamental and crucial for us ...

Please sign up or login with your details

Forgot password? Click here to reset