Relation-Based Associative Joint Location for Human Pose Estimation in Videos

by   Yonghao Dang, et al.

Video-based human pose estimation (HPE) is a vital yet challenging task. While deep learning methods have made significant progress for the HPE, most approaches to this task detect each joint independently, damaging the pose structural information. In this paper, unlike the prior methods, we propose a Relation-based Pose Semantics Transfer Network (RPSTN) to locate joints associatively. Specifically, we design a lightweight joint relation extractor (JRE) to model the pose structural features and associatively generate heatmaps for joints by modeling the relation between any two joints heuristically instead of building each joint heatmap independently. Actually, the proposed JRE module models the spatial configuration of human poses through the relationship between any two joints. Moreover, considering the temporal semantic continuity of videos, the pose semantic information in the current frame is beneficial for guiding the location of joints in the next frame. Therefore, we use the idea of knowledge reuse to propagate the pose semantic information between consecutive frames. In this way, the proposed RPSTN captures temporal dynamics of poses. On the one hand, the JRE module can infer invisible joints according to the relationship between the invisible joints and other visible joints in space. On the other hand, in the time, the propose model can transfer the pose semantic features from the non-occluded frame to the occluded frame to locate occluded joints. Therefore, our method is robust to the occlusion and achieves state-of-the-art results on the two challenging datasets, which demonstrates its effectiveness for video-based human pose estimation. We will release the code and models publicly.


page 1

page 7

page 8


Leveraging Temporal Joint Depths for Improving 3D Human Pose Estimation in Video

The effectiveness of the approaches to predict 3D poses from 2D poses es...

Learning Human Kinematics by Modeling Temporal Correlations between Joints for Video-based Human Pose Estimation

Estimating human poses from videos is critical in human-computer interac...

Learning Dynamics via Graph Neural Networks for Human Pose Estimation and Tracking

Multi-person pose estimation and tracking serve as crucial steps for vid...

Temporal Smoothing for 3D Human Pose Estimation and Localization for Occluded People

In multi-person pose estimation actors can be heavily occluded, even bec...

T-LEAP: occlusion-robust pose estimation of walking cows using temporal information

As herd size on dairy farms continue to increase, automatic health monit...

(Fusionformer):Exploiting the Joint Motion Synergy with Fusion Network Based On Transformer for 3D Human Pose Estimation

For the current 3D human pose estimation task, in order to improve the e...

Deformer: Dynamic Fusion Transformer for Robust Hand Pose Estimation

Accurately estimating 3D hand pose is crucial for understanding how huma...

Please sign up or login with your details

Forgot password? Click here to reset