Self-supervised Learning of Pose Embeddings from Spatiotemporal Relations in Videos

08/07/2017
by   Ömer Sümer, et al.
0

Human pose analysis is presently dominated by deep convolutional networks trained with extensive manual annotations of joint locations and beyond. To avoid the need for expensive labeling, we exploit spatiotemporal relations in training videos for self-supervised learning of pose embeddings. The key idea is to combine temporal ordering and spatial placement estimation as auxiliary tasks for learning pose similarities in a Siamese convolutional network. Since the self-supervised sampling of both tasks from natural videos can result in ambiguous and incorrect training labels, our method employs a curriculum learning idea that starts training with the most reliable data samples and gradually increases the difficulty. To further refine the training process we mine repetitive poses in individual videos which provide reliable labels while removing inconsistencies. Our pose embeddings capture visual characteristics of human pose that can boost existing supervised representations in human pose estimation and retrieval. We report quantitative and qualitative results on these tasks in Olympic Sports, Leeds Pose Sports and MPII Human Pose datasets.

READ FULL TEXT

page 3

page 5

research
04/09/2020

Self-Supervised 3D Human Pose Estimation via Part Guided Novel Image Synthesis

Camera captured human pose is an outcome of several sources of variation...
research
04/05/2023

Self-supervised 3D Human Pose Estimation from a Single Image

We propose a new self-supervised method for predicting 3D human body pos...
research
09/20/2023

Understanding Pose and Appearance Disentanglement in 3D Human Pose Estimation

As 3D human pose estimation can now be achieved with very high accuracy ...
research
12/06/2020

Temporal-Aware Self-Supervised Learning for 3D Hand Pose and Mesh Estimation in Videos

Estimating 3D hand pose directly from RGB imagesis challenging but has g...
research
04/26/2022

Context-Aware Sequence Alignment using 4D Skeletal Augmentation

Temporal alignment of fine-grained human actions in videos is important ...
research
01/12/2019

3D Human Pose Machines with Self-supervised Learning

Driven by recent computer vision and robotic applications, recovering 3D...
research
03/28/2016

Shuffle and Learn: Unsupervised Learning using Temporal Order Verification

In this paper, we present an approach for learning a visual representati...

Please sign up or login with your details

Forgot password? Click here to reset