Neural Scene Decomposition for Multi-Person Motion Capture

03/13/2019
by   Helge Rhodin, et al.
10

Learning general image representations has proven key to the success of many computer vision tasks. For example, many approaches to image understanding problems rely on deep networks that were initially trained on ImageNet, mostly because the learned features are a valuable starting point to learn from limited labeled data. However, when it comes to 3D motion capture of multiple people, these features are only of limited use. In this paper, we therefore propose an approach to learning features that are useful for this purpose. To this end, we introduce a self-supervised approach to learning what we call a neural scene decomposition (NSD) that can be exploited for 3D pose estimation. NSD comprises three layers of abstraction to represent human subjects: spatial layout in terms of bounding-boxes and relative depth; a 2D shape representation in terms of an instance segmentation mask; and subject-specific appearance and 3D pose information. By exploiting self-supervision coming from multiview data, our NSD model can be trained end-to-end without any 2D or 3D supervision. In contrast to previous approaches, it works for multiple persons and full-frame images. Because it encodes 3D geometry, NSD can then be effectively leveraged to train a 3D pose estimation network from small amounts of annotated data.

READ FULL TEXT

page 1

page 4

page 5

page 7

page 8

page 9

page 10

research
09/20/2023

Understanding Pose and Appearance Disentanglement in 3D Human Pose Estimation

As 3D human pose estimation can now be achieved with very high accuracy ...
research
11/16/2016

Associative Embedding: End-to-End Learning for Joint Detection and Grouping

We introduce associative embedding, a novel method for supervising convo...
research
03/17/2020

Neural Mesh Refiner for 6-DoF Pose Estimation

How can we effectively utilise the 2D monocular image information for re...
research
11/04/2018

DeepKey: Towards End-to-End Physical Key Replication From a Single Photograph

This paper describes DeepKey, an end-to-end deep neural architecture cap...
research
05/17/2022

Self-supervised Neural Articulated Shape and Appearance Models

Learning geometry, motion, and appearance priors of object classes is im...
research
10/23/2017

Generic 3D Representation via Pose Estimation and Matching

Though a large body of computer vision research has investigated develop...
research
05/04/2020

VisualEchoes: Spatial Image Representation Learning through Echolocation

Several animal species (e.g., bats, dolphins, and whales) and even visua...

Please sign up or login with your details

Forgot password? Click here to reset