PSVT: End-to-End Multi-person 3D Pose and Shape Estimation with Progressive Video Transformers

03/16/2023
by   Zhongwei Qiu, et al.
0

Existing methods of multi-person video 3D human Pose and Shape Estimation (PSE) typically adopt a two-stage strategy, which first detects human instances in each frame and then performs single-person PSE with temporal model. However, the global spatio-temporal context among spatial instances can not be captured. In this paper, we propose a new end-to-end multi-person 3D Pose and Shape estimation framework with progressive Video Transformer, termed PSVT. In PSVT, a spatio-temporal encoder (STE) captures the global feature dependencies among spatial objects. Then, spatio-temporal pose decoder (STPD) and shape decoder (STSD) capture the global dependencies between pose queries and feature tokens, shape queries and feature tokens, respectively. To handle the variances of objects as time proceeds, a novel scheme of progressive decoding is used to update pose and shape queries at each frame. Besides, we propose a novel pose-guided attention (PGA) for shape decoder to better predict shape parameters. The two components strengthen the decoder of PSVT to improve performance. Extensive experiments on the four datasets show that PSVT achieves stage-of-the-art results.

READ FULL TEXT

page 1

page 4

page 8

page 11

page 12

research
03/02/2022

MixSTE: Seq2seq Mixed Spatio-Temporal Encoder for 3D Human Pose Estimation in Video

Recent transformer-based solutions have been introduced to estimate 3D h...
research
03/18/2020

STEm-Seg: Spatio-temporal Embeddings for Instance Segmentation in Videos

Existing methods for instance segmentation in videos typically involve m...
research
10/14/2022

STAR-Transformer: A Spatio-temporal Cross Attention Transformer for Human Action Recognition

In action recognition, although the combination of spatio-temporal video...
research
12/13/2018

Nrityantar: Pose oblivious Indian classical dance sequence classification system

In this paper, we attempt to advance the research work done in human act...
research
12/19/2021

End-to-End Learning of Multi-category 3D Pose and Shape Estimation

In this paper, we study the representation of the shape and pose of obje...
research
07/09/2022

Human-centric Spatio-Temporal Video Grounding via the Combination of Mutual Matching Network and TubeDETR

In this technical report, we represent our solution for the Human-centri...
research
03/10/2023

Human Pose Estimation from Ambiguous Pressure Recordings with Spatio-temporal Masked Transformers

Despite the impressive performance of vision-based pose estimators, they...

Please sign up or login with your details

Forgot password? Click here to reset