HARP: Personalized Hand Reconstruction from a Monocular RGB Video

by   Korrawe Karunratanakul, et al.
ETH Zurich

We present HARP (HAnd Reconstruction and Personalization), a personalized hand avatar creation approach that takes a short monocular RGB video of a human hand as input and reconstructs a faithful hand avatar exhibiting a high-fidelity appearance and geometry. In contrast to the major trend of neural implicit representations, HARP models a hand with a mesh-based parametric hand model, a vertex displacement map, a normal map, and an albedo without any neural components. As validated by our experiments, the explicit nature of our representation enables a truly scalable, robust, and efficient approach to hand avatar creation. HARP is optimized via gradient descent from a short sequence captured by a hand-held mobile phone and can be directly used in AR/VR applications with real-time rendering capability. To enable this, we carefully design and implement a shadow-aware differentiable rendering scheme that is robust to high degree articulations and self-shadowing regularly present in hand motion sequences, as well as challenging lighting conditions. It also generalizes to unseen poses and novel viewpoints, producing photo-realistic renderings of hand animations performing highly-articulated motions. Furthermore, the learned HARP representation can be used for improving 3D hand pose estimation quality in challenging viewpoints. The key advantages of HARP are validated by the in-depth analyses on appearance reconstruction, novel-view and novel pose synthesis, and 3D hand pose refinement. It is an AR/VR-ready personalized hand representation that shows superior fidelity and scalability.


page 1

page 4

page 6

page 8

page 13

page 14

page 17

page 19


LiveHand: Real-time and Photorealistic Neural Hand Rendering

The human hand is the main medium through which we interact with our sur...

Hand Avatar: Free-Pose Hand Animation and Rendering from Monocular Video

We present HandAvatar, a novel representation for hand animation and ren...

PointAvatar: Deformable Point-based Head Avatars from Videos

The ability to create realistic, animatable and relightable head avatars...

Animatable Neural Radiance Fields from Monocular RGB Video

We present animatable neural radiance fields for detailed human avatar c...

Reconstructing Hand-Held Objects from Monocular Video

This paper presents an approach that reconstructs a hand-held object fro...

High-fidelity Face Tracking for AR/VR via Deep Lighting Adaptation

3D video avatars can empower virtual communications by providing compres...

FOF: Learning Fourier Occupancy Field for Monocular Real-time Human Reconstruction

The advent of deep learning has led to significant progress in monocular...

Please sign up or login with your details

Forgot password? Click here to reset