THOR-Net: End-to-end Graformer-based Realistic Two Hands and Object Reconstruction with Self-supervision

by   Ahmed Tawfik Aboukhadra, et al.

Realistic reconstruction of two hands interacting with objects is a new and challenging problem that is essential for building personalized Virtual and Augmented Reality environments. Graph Convolutional networks (GCNs) allow for the preservation of the topologies of hands poses and shapes by modeling them as a graph. In this work, we propose the THOR-Net which combines the power of GCNs, Transformer, and self-supervision to realistically reconstruct two hands and an object from a single RGB image. Our network comprises two stages; namely the features extraction stage and the reconstruction stage. In the features extraction stage, a Keypoint RCNN is used to extract 2D poses, features maps, heatmaps, and bounding boxes from a monocular RGB image. Thereafter, this 2D information is modeled as two graphs and passed to the two branches of the reconstruction stage. The shape reconstruction branch estimates meshes of two hands and an object using our novel coarse-to-fine GraFormer shape network. The 3D poses of the hands and objects are reconstructed by the other branch using a GraFormer network. Finally, a self-supervised photometric loss is used to directly regress the realistic textured of each vertex in the hands' meshes. Our approach achieves State-of-the-art results in Hand shape estimation on the HO-3D dataset (10.0mm) exceeding ArtiBoost (10.8mm). It also surpasses other methods in hand pose estimation on the challenging two hands and object (H2O) dataset by 5mm on the left-hand pose and 1 mm on the right-hand pose.


page 1

page 3

page 6

page 8


3D Hand Shape and Pose Estimation from a Single RGB Image

This work addresses a novel and challenging problem of estimating the fu...

Monocular 3D Reconstruction of Interacting Hands via Collision-Aware Factorized Refinements

3D interacting hand reconstruction is essential to facilitate human-mach...

Collaborative Learning for Hand and Object Reconstruction with Attention-guided Graph Convolution

Estimating the pose and shape of hands and objects under interaction fin...

gSDF: Geometry-Driven Signed Distance Functions for 3D Hand-Object Reconstruction

Signed distance functions (SDFs) is an attractive framework that has rec...

Joint Hand-object 3D Reconstruction from a Single Image with Cross-branch Feature Fusion

Accurate 3D reconstruction of the hand and object shape from a hand-obje...

Coarse-to-fine Animal Pose and Shape Estimation

Most existing animal pose and shape estimation approaches reconstruct an...

Attention-based 3D Object Reconstruction from a Single Image

Recently, learning-based approaches for 3D reconstruction from 2D images...

Please sign up or login with your details

Forgot password? Click here to reset