DeepAI AI Chat
Log In Sign Up

Direct Multi-view Multi-person 3D Pose Estimation

by   PetsTime, et al.
National University of Singapore

We present Multi-view Pose transformer (MvP) for estimating multi-person 3D poses from multi-view images. Instead of estimating 3D joint locations from costly volumetric representation or reconstructing the per-person 3D pose from multiple detected 2D poses as in previous methods, MvP directly regresses the multi-person 3D poses in a clean and efficient way, without relying on intermediate tasks. Specifically, MvP represents skeleton joints as learnable query embeddings and let them progressively attend to and reason over the multi-view information from the input images to directly regress the actual 3D joint locations. To improve the accuracy of such a simple pipeline, MvP presents a hierarchical scheme to concisely represent query embeddings of multi-person skeleton joints and introduces an input-dependent query adaptation approach. Further, MvP designs a novel geometrically guided attention mechanism, called projective attention, to more precisely fuse the cross-view information for each joint. MvP also introduces a RayConv operation to integrate the view-dependent camera geometry into the feature representations for augmenting the projective attention. We show experimentally that our MvP model outperforms the state-of-the-art methods on several benchmarks while being much more efficient. Notably, it achieves 92.3 Panoptic dataset, improving upon the previous best approach [36] by 9.8 is general and also extendable to recovering human mesh represented by the SMPL model, thus useful for modeling multi-person body shapes. Code and models are available at


page 3

page 8

page 15


Cross View Fusion for 3D Human Pose Estimation

We present an approach to recover absolute 3D human poses from multi-vie...

Multi-View Multi-Person 3D Pose Estimation with Plane Sweep Stereo

Existing approaches for multi-view multi-person 3D pose estimation expli...

MVHM: A Large-Scale Multi-View Hand Mesh Benchmark for Accurate 3D Hand Pose Estimation

Estimating 3D hand poses from a single RGB image is challenging because ...

VTP: Volumetric Transformer for Multi-view Multi-person 3D Pose Estimation

This paper presents Volumetric Transformer Pose estimator (VTP), the fir...

A Light Touch Approach to Teaching Transformers Multi-view Geometry

Transformers are powerful visual learners, in large part due to their co...

Shape-aware Multi-Person Pose Estimation from Multi-View Images

In this paper we contribute a simple yet effective approach for estimati...

Real-Time Multi-View 3D Human Pose Estimation using Semantic Feedback to Smart Edge Sensors

We present a novel method for estimation of 3D human poses from a multi-...

Code Repositories


Direct Multi-view Multi-person 3D Human Pose Estimation

view repo