Multi-Scale Networks for 3D Human Pose Estimation with Inference Stage Optimization

10/13/2020
by   Cheng Yu, et al.
4

Estimating 3D human poses from a monocular video is still a challenging task. Many existing methods' performance drops when the target person is occluded by other objects, or the motion is too fast/slow relative to the scale and speed of the training data. Moreover, many of these methods are not designed or trained under severe occlusion explicitly, making their performance on handling occlusion compromised. Addressing these problems, we introduce a spatio-temporal network for robust 3D human pose estimation. As humans in videos may appear in different scales and have various motion speeds, we apply multi-scale spatial features for 2D joints or keypoints prediction in each individual frame, and multi-stride temporal convolutional networks (TCNs) to estimate 3D joints or keypoints. Furthermore, we design a spatio-temporal discriminator based on body structures as well as limb motions to assess whether the predicted pose forms a valid pose and a valid movement. During training, we explicitly mask out some keypoints to simulate various occlusion cases, from minor to severe occlusion, so that our network can learn better and becomes robust to various degrees of occlusion. As there are limited 3D ground-truth data, we further utilize 2D video data to inject a semi-supervised learning capability to our network. Moreover, we observe that there is a discrepancy between 3D pose prediction and 2D pose estimation due to different pose variations between video and image training datasets. We, therefore propose a confidence-based inference stage optimization to adaptively enforce 3D pose projection to match 2D pose estimation to further improve final pose prediction accuracy. Experiments on public datasets validate the effectiveness of our method, and our ablation studies show the strengths of our network's individual submodules.

READ FULL TEXT

page 1

page 2

page 3

page 5

page 9

page 10

page 11

page 12

research
04/24/2023

Occlusion Robust 3D Human Pose Estimation with StridedPoseGraphFormer and Data Augmentation

Occlusion is an omnipresent challenge in 3D human pose estimation (HPE)....
research
07/20/2022

OTPose: Occlusion-Aware Transformer for Pose Estimation in Sparsely-Labeled Videos

Although many approaches for multi-human pose estimation in videos have ...
research
03/08/2022

Quantification of Occlusion Handling Capability of a 3D Human Pose Estimation Framework

3D human pose estimation using monocular images is an important yet chal...
research
09/21/2023

ORTexME: Occlusion-Robust Human Shape and Pose via Temporal Average Texture and Mesh Encoding

In 3D human shape and pose estimation from a monocular video, models tra...
research
06/24/2020

3D Pose Detection in Videos: Focusing on Occlusion

In this work, we build upon existing methods for occlusion-aware 3D pose...
research
06/10/2021

Adversarial Motion Modelling helps Semi-supervised Hand Pose Estimation

Hand pose estimation is difficult due to different environmental conditi...
research
07/25/2022

Live Stream Temporally Embedded 3D Human Body Pose and Shape Estimation

3D Human body pose and shape estimation within a temporal sequence can b...

Please sign up or login with your details

Forgot password? Click here to reset