Self-Supervised 3D Hand Pose Estimation from monocular RGB via Contrastive Learning

06/10/2021
by   Adrian Spurr, et al.
0

Acquiring accurate 3D annotated data for hand pose estimation is a notoriously difficult problem. This typically requires complex multi-camera setups and controlled conditions, which in turn creates a domain gap that is hard to bridge to fully unconstrained settings. Encouraged by the success of contrastive learning on image classification tasks, we propose a new self-supervised method for the structured regression task of 3D hand pose estimation. Contrastive learning makes use of unlabeled data for the purpose of representation learning via a loss formulation that encourages the learned feature representations to be invariant under any image transformation. For 3D hand pose estimation, it too is desirable to have invariance to appearance transformation such as color jitter. However, the task requires equivariance under affine transformations, such as rotation and translation. To address this issue, we propose an equivariant contrastive objective and demonstrate its effectiveness in the context of 3D hand pose estimation. We experimentally investigate the impact of invariant and equivariant contrastive objectives and show that learning equivariant features leads to better representations for the task of 3D hand pose estimation. Furthermore, we show that a standard ResNet-152, trained on additional unlabeled data, attains an improvement of 7.6% in PA-EPE on FreiHAND and thus achieves state-of-the-art performance without any task specific, specialized architectures.

READ FULL TEXT

page 1

page 6

research
10/30/2022

Image-free Domain Generalization via CLIP for 3D Hand Pose Estimation

RGB-based 3D hand pose estimation has been successful for decades thanks...
research
09/01/2022

TempCLR: Reconstructing Hands via Time-Coherent Contrastive Learning

We introduce TempCLR, a new time-coherent contrastive learning approach ...
research
10/13/2020

Self-Supervised Multi-View Synchronization Learning for 3D Pose Estimation

Current state-of-the-art methods cast monocular 3D human pose estimation...
research
01/12/2021

Explicit homography estimation improves contrastive self-supervised learning

The typical contrastive self-supervised algorithm uses a similarity meas...
research
02/05/2020

Rotation-invariant Mixed Graphical Model Network for 2D Hand Pose Estimation

In this paper, we propose a new architecture named Rotation-invariant Mi...
research
08/31/2021

ScatSimCLR: self-supervised contrastive learning with pretext task regularization for small-scale datasets

In this paper, we consider a problem of self-supervised learning for sma...
research
03/21/2022

A Contrastive Objective for Learning Disentangled Representations

Learning representations of images that are invariant to sensitive or un...

Please sign up or login with your details

Forgot password? Click here to reset