A Dual-Masked Auto-Encoder for Robust Motion Capture with Spatial-Temporal Skeletal Token Completion

07/15/2022
by   Junkun Jiang, et al.
7

Multi-person motion capture can be challenging due to ambiguities caused by severe occlusion, fast body movement, and complex interactions. Existing frameworks build on 2D pose estimations and triangulate to 3D coordinates via reasoning the appearance, trajectory, and geometric consistencies among multi-camera observations. However, 2D joint detection is usually incomplete and with wrong identity assignments due to limited observation angle, which leads to noisy 3D triangulation results. To overcome this issue, we propose to explore the short-range autoregressive characteristics of skeletal motion using transformer. First, we propose an adaptive, identity-aware triangulation module to reconstruct 3D joints and identify the missing joints for each identity. To generate complete 3D skeletal motion, we then propose a Dual-Masked Auto-Encoder (D-MAE) which encodes the joint status with both skeletal-structural and temporal position encoding for trajectory completion. D-MAE's flexible masking and encoding mechanism enable arbitrary skeleton definitions to be conveniently deployed under the same framework. In order to demonstrate the proposed model's capability in dealing with severe data loss scenarios, we contribute a high-accuracy and challenging motion capture dataset of multi-person interactions with severe occlusion. Evaluations on both benchmark and our new dataset demonstrate the efficiency of our proposed model, as well as its advantage against the other state-of-the-art methods.

READ FULL TEXT

page 1

page 6

page 7

page 8

research
08/19/2022

SoMoFormer: Social-Aware Motion Transformer for Multi-Person Motion Prediction

Multi-person motion prediction remains a challenging problem, especially...
research
03/09/2023

Trajectory-Aware Body Interaction Transformer for Multi-Person Pose Forecasting

Multi-person pose forecasting remains a challenging problem, especially ...
research
07/01/2019

XNect: Real-time Multi-person 3D Human Pose Estimation with a Single RGB Camera

We present a real-time approach for multi-person 3D motion capture at ov...
research
09/26/2019

Liquid Warping GAN: A Unified Framework for Human Motion Imitation, Appearance Transfer and Novel View Synthesis

We tackle the human motion imitation, appearance transfer, and novel vie...
research
08/17/2023

Auxiliary Tasks Benefit 3D Skeleton-based Human Motion Prediction

Exploring spatial-temporal dependencies from observed motions is one of ...
research
08/01/2022

Pose Uncertainty Aware Movement Synchrony Estimation via Spatial-Temporal Graph Transformer

Movement synchrony reflects the coordination of body movements between i...

Please sign up or login with your details

Forgot password? Click here to reset