Encoder-decoder with Multi-level Attention for 3D Human Shape and Pose Estimation

09/06/2021
by   Ziniu Wan, et al.
30

3D human shape and pose estimation is the essential task for human motion analysis, which is widely used in many 3D applications. However, existing methods cannot simultaneously capture the relations at multiple levels, including spatial-temporal level and human joint level. Therefore they fail to make accurate predictions in some hard scenarios when there is cluttered background, occlusion, or extreme pose. To this end, we propose Multi-level Attention Encoder-Decoder Network (MAED), including a Spatial-Temporal Encoder (STE) and a Kinematic Topology Decoder (KTD) to model multi-level attentions in a unified framework. STE consists of a series of cascaded blocks based on Multi-Head Self-Attention, and each block uses two parallel branches to learn spatial and temporal attention respectively. Meanwhile, KTD aims at modeling the joint level attention. It regards pose estimation as a top-down hierarchical process similar to SMPL kinematic tree. With the training set of 3DPW, MAED outperforms previous state-of-the-art methods by 6.2, 7.2, and 2.4 mm of PA-MPJPE on the three widely used benchmarks 3DPW, MPI-INF-3DHP, and Human3.6M respectively. Our code is available at https://github.com/ziniuwan/maed.

READ FULL TEXT

page 2

page 3

page 4

page 8

research
01/18/2023

HSTFormer: Hierarchical Spatial-Temporal Transformers for 3D Human Pose Estimation

Transformer-based approaches have been successfully proposed for 3D huma...
research
11/29/2022

Kinematic-aware Hierarchical Attention Network for Human Pose Estimation in Videos

Previous video-based human pose estimation methods have shown promising ...
research
03/17/2022

MatchFormer: Interleaving Attention in Transformers for Feature Matching

Local feature matching is a computationally intensive task at the subpix...
research
05/31/2022

Joint Spatial-Temporal and Appearance Modeling with Transformer for Multiple Object Tracking

The recent trend in multiple object tracking (MOT) is heading towards le...
research
03/15/2022

P-STMO: Pre-Trained Spatial Temporal Many-to-One Model for 3D Human Pose Estimation

This paper introduces a novel Pre-trained Spatial Temporal Many-to-One (...
research
11/26/2019

Multi-Level Network for High-Speed Multi-Person Pose Estimation

In multi-person pose estimation, the left/right joint type discriminatio...
research
12/01/2020

Structured Context Enhancement Network for Mouse Pose Estimation

Automated analysis of mouse behaviours is crucial for many applications ...

Please sign up or login with your details

Forgot password? Click here to reset