Distilling Token-Pruned Pose Transformer for 2D Human Pose Estimation

04/12/2023
by   Feixiang Ren, et al.
0

Human pose estimation has seen widespread use of transformer models in recent years. Pose transformers benefit from the self-attention map, which captures the correlation between human joint tokens and the image. However, training such models is computationally expensive. The recent token-Pruned Pose Transformer (PPT) solves this problem by pruning the background tokens of the image, which are usually less informative. However, although it improves efficiency, PPT inevitably leads to worse performance than TokenPose due to the pruning of tokens. To overcome this problem, we present a novel method called Distilling Pruned-Token Transformer for human pose estimation (DPPT). Our method leverages the output of a pre-trained TokenPose to supervise the learning process of PPT. We also establish connections between the internal structure of pose transformers and PPT, such as attention maps and joint features. Our experimental results on the MPII datasets show that our DPPT can significantly improve PCK compared to previous PPT models while still reducing computational complexity.

READ FULL TEXT
research
03/29/2021

TFPose: Direct Human Pose Estimation with Transformers

We propose a human pose estimation framework that solves the task in the...
research
08/06/2022

IVT: An End-to-End Instance-guided Video Transformer for 3D Pose Estimation

Video 3D human pose estimation aims to localize the 3D coordinates of hu...
research
10/12/2022

Uplift and Upsample: Efficient 3D Human Pose Estimation with Uplifting Transformers

The state-of-the-art for monocular 3D human pose estimation in videos is...
research
08/07/2022

Jointformer: Single-Frame Lifting Transformer with Error Prediction and Refinement for 3D Human Pose Estimation

Monocular 3D human pose estimation technologies have the potential to gr...
research
06/09/2022

Building Spatio-temporal Transformers for Egocentric 3D Pose Estimation

Egocentric 3D human pose estimation (HPE) from images is challenging due...
research
06/07/2023

Efficient Vision Transformer for Human Pose Estimation via Patch Selection

While Convolutional Neural Networks (CNNs) have been widely successful i...
research
03/21/2023

Human Pose as Compositional Tokens

Human pose is typically represented by a coordinate vector of body joint...

Please sign up or login with your details

Forgot password? Click here to reset