HSTFormer: Hierarchical Spatial-Temporal Transformers for 3D Human Pose Estimation

01/18/2023
by   Xiaoye Qian, et al.
0

Transformer-based approaches have been successfully proposed for 3D human pose estimation (HPE) from 2D pose sequence and achieved state-of-the-art (SOTA) performance. However, current SOTAs have difficulties in modeling spatial-temporal correlations of joints at different levels simultaneously. This is due to the poses' spatial-temporal complexity. Poses move at various speeds temporarily with various joints and body-parts movement spatially. Hence, a cookie-cutter transformer is non-adaptable and can hardly meet the "in-the-wild" requirement. To mitigate this issue, we propose Hierarchical Spatial-Temporal transFormers (HSTFormer) to capture multi-level joints' spatial-temporal correlations from local to global gradually for accurate 3D HPE. HSTFormer consists of four transformer encoders (TEs) and a fusion module. To the best of our knowledge, HSTFormer is the first to study hierarchical TEs with multi-level fusion. Extensive experiments on three datasets (i.e., Human3.6M, MPI-INF-3DHP, and HumanEva) demonstrate that HSTFormer achieves competitive and consistent performance on benchmarks with various scales and difficulties. Specifically, it surpasses recent SOTAs on the challenging MPI-INF-3DHP dataset and small-scale HumanEva dataset, with a highly generalized systematic approach. The code is available at: https://github.com/qianxiaoye825/HSTFormer.

READ FULL TEXT

page 3

page 7

page 10

page 11

research
03/29/2021

3D Human Pose Estimation with Spatial and Temporal Transformers

Transformer architectures have become the model of choice in natural lan...
research
09/06/2021

Encoder-decoder with Multi-level Attention for 3D Human Shape and Pose Estimation

3D human shape and pose estimation is the essential task for human motio...
research
08/10/2023

Double-chain Constraints for 3D Human Pose Estimation in Images and Videos

Reconstructing 3D poses from 2D poses lacking depth information is parti...
research
12/13/2015

Articulated Pose Estimation Using Hierarchical Exemplar-Based Models

Exemplar-based models have achieved great success on localizing the part...
research
07/01/2022

MotionMixer: MLP-based 3D Human Body Pose Forecasting

In this work, we present MotionMixer, an efficient 3D human body pose fo...
research
03/15/2022

P-STMO: Pre-Trained Spatial Temporal Many-to-One Model for 3D Human Pose Estimation

This paper introduces a novel Pre-trained Spatial Temporal Many-to-One (...
research
09/15/2021

Learning Dynamical Human-Joint Affinity for 3D Pose Estimation in Videos

Graph Convolution Network (GCN) has been successfully used for 3D human ...

Please sign up or login with your details

Forgot password? Click here to reset