Neural Voting Field for Camera-Space 3D Hand Pose Estimation

by   Lin Huang, et al.

We present a unified framework for camera-space 3D hand pose estimation from a single RGB image based on 3D implicit representation. As opposed to recent works, most of which first adopt holistic or pixel-level dense regression to obtain relative 3D hand pose and then follow with complex second-stage operations for 3D global root or scale recovery, we propose a novel unified 3D dense regression scheme to estimate camera-space 3D hand pose via dense 3D point-wise voting in camera frustum. Through direct dense modeling in 3D domain inspired by Pixel-aligned Implicit Functions for 3D detailed reconstruction, our proposed Neural Voting Field (NVF) fully models 3D dense local evidence and hand global geometry, helping to alleviate common 2D-to-3D ambiguities. Specifically, for a 3D query point in camera frustum and its pixel-aligned image feature, NVF, represented by a Multi-Layer Perceptron, regresses: (i) its signed distance to the hand surface; (ii) a set of 4D offset vectors (1D voting weight and 3D directional vector to each hand joint). Following a vote-casting scheme, 4D offset vectors from near-surface points are selected to calculate the 3D hand joint coordinates by a weighted average. Experiments demonstrate that NVF outperforms existing state-of-the-art algorithms on FreiHAND dataset for camera-space 3D hand pose estimation. We also adapt NVF to the classic task of root-relative 3D hand pose estimation, for which NVF also obtains state-of-the-art results on HO3D dataset.


page 1

page 7

page 12

page 14

page 15

page 16

page 17


JGR-P2O: Joint Graph Reasoning based Pixel-to-Offset Prediction Network for 3D Hand Pose Estimation from a Single Depth Image

State-of-the-art single depth image-based 3D hand pose estimation method...

Point-to-Pose Voting based Hand Pose Estimation using Residual Permutation Equivariant Layer

Recently, 3D input data based hand pose estimation methods have shown st...

Dense 3D Regression for Hand Pose Estimation

We present a simple and effective method for 3D hand pose estimation fro...

Vote from the Center: 6 DoF Pose Estimation in RGB-D Images by Radial Keypoint Voting

We propose a novel keypoint voting scheme based on intersecting spheres,...

PVNet: Pixel-wise Voting Network for 6DoF Pose Estimation

This paper addresses the challenge of 6DoF pose estimation from a single...

Mask2Hand: Learning to Predict the 3D Hand Pose and Shape from Shadow

We present a self-trainable method, Mask2Hand, which learns to solve the...

Nerfels: Renderable Neural Codes for Improved Camera Pose Estimation

This paper presents a framework that combines traditional keypoint-based...

Please sign up or login with your details

Forgot password? Click here to reset