HandsFormer: Keypoint Transformer for Monocular 3D Pose Estimation ofHands and Object in Interaction

by   Shreyas Hampali, et al.

We propose a robust and accurate method for estimating the 3D poses of two hands in close interaction from a single color image. This is a very challenging problem, as large occlusions and many confusions between the joints may happen. Our method starts by extracting a set of potential 2D locations for the joints of both hands as extrema of a heatmap. We do not require that all locations correctly correspond to a joint, not that all the joints are detected. We use appearance and spatial encodings of these locations as input to a transformer, and leverage the attention mechanisms to sort out the correct configuration of the joints and output the 3D poses of both hands. Our approach thus allies the recognition power of a Transformer to the accuracy of heatmap-based methods. We also show it can be extended to estimate the 3D pose of an object manipulated by one or two hands. We evaluate our approach on the recent and challenging InterHand2.6M and HO-3D datasets. We obtain 17 improvement over the baseline. Moreover, we introduce the first dataset made of action sequences of two hands manipulating an object fully annotated in 3D and will make it publicly available.


page 1

page 4

page 6

page 8

page 10

page 11

page 12

page 13


Learning Sequential Contexts using Transformer for 3D Hand Pose Estimation

3D hand pose estimation (HPE) is the process of locating the joints of t...

BB8: A Scalable, Accurate, Robust to Partial Occlusion Method for Predicting the 3D Poses of Challenging Objects without Using Depth

We introduce a novel method for 3D object detection and pose estimation ...

Segmentation-driven 6D Object Pose Estimation

The most recent trend in estimating the 6D pose of rigid objects has bee...

Making Deep Heatmaps Robust to Partial Occlusions for 3D Object Pose Estimation

We introduce a novel method for robust and accurate 3D object pose estim...

ArtiBoost: Boosting Articulated 3D Hand-Object Pose Estimation via Online Exploration and Synthesis

Estimating the articulated 3D hand-object pose from a single RGB image i...

A Dual-Source Attention Transformer for Multi-Person Pose Tracking

Multi-person pose tracking is an important element for many applications...

Reconstructing Vechicles from a Single Image: Shape Priors for Road Scene Understanding

We present an approach for reconstructing vehicles from a single (RGB) i...

Please sign up or login with your details

Forgot password? Click here to reset