Self-supervised Learning of Motion Capture

12/04/2017
by   Hsiao-Yu Fish Tung, et al.
0

Current state-of-the-art solutions for motion capture from a single camera are optimization driven: they optimize the parameters of a 3D human model so that its re-projection matches measurements in the video (e.g. person segmentation, optical flow, keypoint detections etc.). Optimization models are susceptible to local minima. This has been the bottleneck that forced using clean green-screen like backgrounds at capture time, manual initialization, or switching to multiple cameras as input resource. In this work, we propose a learning based motion capture model for single camera input. Instead of optimizing mesh and skeleton parameters directly, our model optimizes neural network weights that predict 3D shape and skeleton configurations given a monocular RGB video. Our model is trained using a combination of strong supervision from synthetic data, and self-supervision from differentiable rendering of (a) skeletal keypoints, (b) dense 3D mesh motion, and (c) human-background segmentation, in an end-to-end framework. Empirically we show our model combines the best of both worlds of supervised learning and test-time optimization: supervised learning initializes the model parameters in the right regime, ensuring good pose and surface initialization at test time, without manual effort. Self-supervision by back-propagating through differentiable rendering allows (unsupervised) adaptation of the model to the test data, and offers much tighter fit than a pretrained fixed model. We show that the proposed model improves with experience and converges to low-error solutions where previous optimization methods fail.

READ FULL TEXT

page 2

page 4

page 8

research
07/15/2018

Cross Pixel Optical Flow Similarity for Self-Supervised Learning

We propose a novel method for learning convolutional neural image repres...
research
02/11/2021

A-NeRF: Surface-free Human 3D Pose Refinement via Neural Rendering

While deep learning has reshaped the classical motion capture pipeline, ...
research
04/13/2020

Monocular Depth Estimation with Self-supervised Instance Adaptation

Recent advances in self-supervised learning havedemonstrated that it is ...
research
08/16/2020

Neural Descent for Visual 3D Human Pose and Shape

We present deep neural network methodology to reconstruct the 3d pose an...
research
07/12/2019

Self-supervised Learning with Geometric Constraints in Monocular Video: Connecting Flow, Depth, and Camera

We present GLNet, a self-supervised framework for learning depth, optica...
research
03/30/2021

Learning monocular 3D reconstruction of articulated categories from motion

Monocular 3D reconstruction of articulated object categories is challeng...
research
03/20/2023

Open-World Pose Transfer via Sequential Test-Time Adaption

Pose transfer aims to transfer a given person into a specified posture, ...

Please sign up or login with your details

Forgot password? Click here to reset