Lifting Transformer for 3D Human Pose Estimation in Video

03/26/2021
by   Wenhao Li, et al.
0

Despite great progress in video-based 3D human pose estimation, it is still challenging to learn a discriminative single-pose representation from redundant sequences. To this end, we propose a novel Transformer-based architecture, called Lifting Transformer, for 3D human pose estimation to lift a sequence of 2D joint locations to a 3D pose. Specifically, a vanilla Transformer encoder (VTE) is adopted to model long-range dependencies of 2D pose sequences. To reduce redundancy of the sequence and aggregate information from local context, fully-connected layers in the feed-forward network of VTE are replaced with strided convolutions to progressively reduce the sequence length. The modified VTE is termed as strided Transformer encoder (STE) and it is built upon the outputs of VTE. STE not only significantly reduces the computation cost but also effectively aggregates information to a single-vector representation in a global and local fashion. Moreover, a full-to-single supervision scheme is employed at both the full sequence scale and single target frame scale, applying to the outputs of VTE and STE, respectively. This scheme imposes extra temporal smoothness constraints in conjunction with the single target frame supervision. The proposed architecture is evaluated on two challenging benchmark datasets, namely, Human3.6M and HumanEva-I, and achieves state-of-the-art results with much fewer parameters.

READ FULL TEXT

page 8

page 11

research
03/24/2022

CrossFormer: Cross Spatio-Temporal Transformer for 3D Human Pose Estimation

3D human pose estimation can be handled by encoding the geometric depend...
research
01/22/2020

UniPose: Unified Human Pose Estimation in Single Images and Videos

We propose UniPose, a unified framework for human pose estimation, based...
research
10/09/2022

AMPose: Alternatively Mixed Global-Local Attention Model for 3D Human Pose Estimation

The graph convolutional network (GCN) has been applied to 3D human pose ...
research
03/29/2021

Context Modeling in 3D Human Pose Estimation: A Unified Perspective

Estimating 3D human pose from a single image suffers from severe ambigui...
research
01/19/2022

Swin-Pose: Swin Transformer Based Human Pose Estimation

Convolutional neural networks (CNNs) have been widely utilized in many c...
research
09/02/2022

DPIT: Dual-Pipeline Integrated Transformer for Human Pose Estimation

Human pose estimation aims to figure out the keypoints of all people in ...

Please sign up or login with your details

Forgot password? Click here to reset