Cross-modal Deep Variational Hand Pose Estimation

03/30/2018
by   Adrian Spurr, et al.
0

The human hand moves in complex and high-dimensional ways, making estimation of 3D hand pose configurations from images alone a challenging task. In this work we propose a method to learn a statistical hand model represented by a cross-modal trained latent space via a generative deep neural network. We derive an objective function from the variational lower bound of the VAE framework and jointly optimize the resulting cross-modal KL-divergence and the posterior reconstruction objective, naturally admitting a training regime that leads to a coherent latent space across multiple modalities such as RGB images, 2D keypoint detections or 3D hand configurations. Additionally, it grants a straightforward way of using semi-supervision. This latent space can be directly used to estimate 3D hand poses from RGB images, outperforming the state-of-the art in different settings. Furthermore, we show that our proposed method can be used without changes on depth images and performs comparably to specialized methods. Finally, the model is fully generative and can synthesize consistent pairs of hand configurations across modalities. We evaluate our method on both RGB and depth datasets and analyze the latent space qualitatively.

READ FULL TEXT

page 8

page 13

page 14

page 15

research
12/03/2018

Disentangling Latent Hands for Image Synthesis and Pose Estimation

Hand image synthesis and pose estimation from RGB images are both highly...
research
05/01/2021

Sparse Pose Trajectory Completion

We propose a method to learn, even using a dataset where objects appear ...
research
04/19/2021

LaLaLoc: Latent Layout Localisation in Dynamic, Unvisited Environments

We present LaLaLoc to localise in environments without the need for prio...
research
07/14/2018

3D Hand Pose Estimation using Simulation and Partial-Supervision with a Shared Latent Space

Tremendous amounts of expensive annotated data are a vital ingredient fo...
research
06/14/2019

Modality Conversion of Handwritten Patterns by Cross Variational Autoencoders

This research attempts to construct a network that can convert online an...
research
09/03/2019

Translating Visual Art into Music

The Synesthetic Variational Autoencoder (SynVAE) introduced in this rese...
research
02/07/2022

Unsupervised physics-informed disentanglement of multimodal data for high-throughput scientific discovery

We introduce physics-informed multimodal autoencoders (PIMA) - a variati...

Please sign up or login with your details

Forgot password? Click here to reset