VTP: Volumetric Transformer for Multi-view Multi-person 3D Pose Estimation

05/25/2022
by   Yuxing Chen, et al.
0

This paper presents Volumetric Transformer Pose estimator (VTP), the first 3D volumetric transformer framework for multi-view multi-person 3D human pose estimation. VTP aggregates features from 2D keypoints in all camera views and directly learns the spatial relationships in the 3D voxel space in an end-to-end fashion. The aggregated 3D features are passed through 3D convolutions before being flattened into sequential embeddings and fed into a transformer. A residual structure is designed to further improve the performance. In addition, the sparse Sinkhorn attention is empowered to reduce the memory cost, which is a major bottleneck for volumetric representations, while also achieving excellent performance. The output of the transformer is again concatenated with 3D convolutional features by a residual design. The proposed VTP framework integrates the high performance of the transformer with volumetric representations, which can be used as a good alternative to the convolutional backbones. Experiments on the Shelf, Campus and CMU Panoptic benchmarks show promising results in terms of both Mean Per Joint Position Error (MPJPE) and Percentage of Correctly estimated Parts (PCP). Our code will be available.

READ FULL TEXT
research
10/11/2021

Adaptively Multi-view and Temporal Fusing Transformer for 3D Human Pose Estimation

In practical application, 3D Human Pose Estimation (HPE) is facing with ...
research
10/14/2021

HUMAN4D: A Human-Centric Multimodal Dataset for Motions and Immersive Media

We introduce HUMAN4D, a large and multimodal 4D dataset that contains a ...
research
09/05/2023

Representation Learning for Sequential Volumetric Design Tasks

Volumetric design, also called massing design, is the first and critical...
research
11/07/2021

Direct Multi-view Multi-person 3D Pose Estimation

We present Multi-view Pose transformer (MvP) for estimating multi-person...
research
05/14/2019

Learnable Triangulation of Human Pose

We present two novel solutions for multi-view 3D human pose estimation b...
research
04/06/2020

Light3DPose: Real-time Multi-Person 3D PoseEstimation from Multiple Views

We present an approach to perform 3D pose estimation of multiple people ...
research
04/12/2016

Volumetric and Multi-View CNNs for Object Classification on 3D Data

3D shape models are becoming widely available and easier to capture, mak...

Please sign up or login with your details

Forgot password? Click here to reset