Learning Variational Motion Prior for Video-based Motion Capture

10/27/2022
by   Xin Chen, et al.
0

Motion capture from a monocular video is fundamental and crucial for us humans to naturally experience and interact with each other in Virtual Reality (VR) and Augmented Reality (AR). However, existing methods still struggle with challenging cases involving self-occlusion and complex poses due to the lack of effective motion prior modeling. In this paper, we present a novel variational motion prior (VMP) learning approach for video-based motion capture to resolve the above issue. Instead of directly building the correspondence between the video and motion domain, We propose to learn a generic latent space for capturing the prior distribution of all natural motions, which serve as the basis for subsequent video-based motion capture tasks. To improve the generalization capacity of prior space, we propose a transformer-based variational autoencoder pretrained over marker-based 3D mocap data, with a novel style-mapping block to boost the generation quality. Afterward, a separate video encoder is attached to the pretrained motion generator for end-to-end fine-tuning over task-specific video datasets. Compared to existing motion prior models, our VMP model serves as a motion rectifier that can effectively reduce temporal jittering and failure modes in frame-wise pose estimation, leading to temporally stable and visually realistic motion capture results. Furthermore, our VMP-based framework models motion at sequence level and can directly generate motion clips in the forward pass, achieving real-time motion capture during inference. Extensive experiments over both public datasets and in-the-wild videos have demonstrated the efficacy and generalization capability of our framework.

READ FULL TEXT

page 1

page 3

page 4

page 5

page 6

page 7

page 8

research
06/07/2021

Task-Generic Hierarchical Human Motion Prior using VAEs

A deep generative model that describes human motions can benefit a wide ...
research
08/23/2022

StyleTalker: One-shot Style-based Audio-driven Talking Head Video Generation

We propose StyleTalker, a novel audio-driven talking head generation mod...
research
04/23/2021

SportsCap: Monocular 3D Human Motion Capture and Fine-grained Understanding in Challenging Sports Videos

Markerless motion capture and understanding of professional non-daily hu...
research
08/31/2023

StyleInV: A Temporal Style Modulated Inversion Network for Unconditional Video Generation

Unconditional video generation is a challenging task that involves synth...
research
06/09/2023

Motion-DVAE: Unsupervised learning for fast human motion denoising

Pose and motion priors are crucial for recovering realistic and accurate...
research
09/17/2023

MOVIN: Real-time Motion Capture using a Single LiDAR

Recent advancements in technology have brought forth new forms of intera...
research
08/17/2023

Realistic Full-Body Tracking from Sparse Observations via Joint-Level Modeling

To bridge the physical and virtual worlds for rapidly developed VR/AR ap...

Please sign up or login with your details

Forgot password? Click here to reset