Jointformer: Single-Frame Lifting Transformer with Error Prediction and Refinement for 3D Human Pose Estimation

08/07/2022
by   Sebastian Lutz, et al.
0

Monocular 3D human pose estimation technologies have the potential to greatly increase the availability of human movement data. The best-performing models for single-image 2D-3D lifting use graph convolutional networks (GCNs) that typically require some manual input to define the relationships between different body joints. We propose a novel transformer-based approach that uses the more generalised self-attention mechanism to learn these relationships within a sequence of tokens representing joints. We find that the use of intermediate supervision, as well as residual connections between the stacked encoders benefits performance. We also suggest that using error prediction as part of a multi-task learning framework improves performance by allowing the network to compensate for its confidence level. We perform extensive ablation studies to show that each of our contributions increases performance. Furthermore, we show that our approach outperforms the recent state of the art for single-frame 3D human pose estimation by a large margin. Our code and trained models are made publicly available on Github.

READ FULL TEXT
research
03/22/2016

Stacked Hourglass Networks for Human Pose Estimation

This work introduces a novel convolutional network architecture for the ...
research
04/12/2023

Distilling Token-Pruned Pose Transformer for 2D Human Pose Estimation

Human pose estimation has seen widespread use of transformer models in r...
research
10/22/2022

HuPR: A Benchmark for Human Pose Estimation Using Millimeter Wave Radar

This paper introduces a novel human pose estimation benchmark, Human Pos...
research
06/13/2014

Heterogeneous Multi-task Learning for Human Pose Estimation with Deep Convolutional Neural Network

We propose an heterogeneous multi-task learning framework for human pose...
research
02/24/2017

Multi-Context Attention for Human Pose Estimation

In this paper, we propose to incorporate convolutional neural networks w...
research
03/21/2023

Human Pose as Compositional Tokens

Human pose is typically represented by a coordinate vector of body joint...

Please sign up or login with your details

Forgot password? Click here to reset