SRNet: Improving Generalization in 3D Human Pose Estimation with a Split-and-Recombine Approach

by   Ailing Zeng, et al.

Human poses that are rare or unseen in a training set are challenging for a network to predict. Similar to the long-tailed distribution problem in visual recognition, the small number of examples for such poses limits the ability of networks to model them. Interestingly, local pose distributions suffer less from the long-tail problem, i.e., local joint configurations within a rare pose may appear within other poses in the training set, making them less rare. We propose to take advantage of this fact for better generalization to rare and unseen poses. To be specific, our method splits the body into local regions and processes them in separate network branches, utilizing the property that a joint position depends mainly on the joints within its local body region. Global coherence is maintained by recombining the global context from the rest of the body into each branch as a low-dimensional vector. With the reduced dimensionality of less relevant body areas, the training set distribution within network branches more closely reflects the statistics of local poses instead of global body poses, without sacrificing information important for joint inference. The proposed split-and-recombine approach, called SRNet, can be easily adapted to both single-image and temporal models, and it leads to appreciable improvements in the prediction of rare and unseen poses.


page 2

page 24


PoseTrans: A Simple Yet Effective Pose Transformation Augmentation for Human Pose Estimation

Human pose estimation aims to accurately estimate a wide variety of huma...

Chasing the Tail in Monocular 3D Human Reconstruction with Prototype Memory

Deep neural networks have achieved great progress in single-image 3D hum...

CameraPose: Weakly-Supervised Monocular 3D Human Pose Estimation by Leveraging In-the-wild 2D Annotations

To improve the generalization of 3D human pose estimators, many existing...

Improving Robustness and Accuracy via Relative Information Encoding in 3D Human Pose Estimation

Most of the existing 3D human pose estimation approaches mainly focus on...

Cascaded deep monocular 3D human pose estimation with evolutionary training data

End-to-end deep representation learning has achieved remarkable accuracy...

View-Invariant Probabilistic Embedding for Human Pose

Depictions of similar human body configurations can vary with changing v...

StyleGAN-Human: A Data-Centric Odyssey of Human Generation

Unconditional human image generation is an important task in vision and ...

Please sign up or login with your details

Forgot password? Click here to reset