Learning View-Disentangled Human Pose Representation by Contrastive Cross-View Mutual Information Maximization

12/02/2020
by   Long Zhao, et al.
7

We introduce a novel representation learning method to disentangle pose-dependent as well as view-dependent factors from 2D human poses. The method trains a network using cross-view mutual information maximization (CV-MIM) which maximizes mutual information of the same pose performed from different viewpoints in a contrastive learning manner. We further propose two regularization terms to ensure disentanglement and smoothness of the learned representations. The resulting pose representations can be used for cross-view action recognition. To evaluate the power of the learned representations, in addition to the conventional fully-supervised action recognition settings, we introduce a novel task called single-shot cross-view action recognition. This task trains models with actions from only one single viewpoint while models are evaluated on poses captured from all possible viewpoints. We evaluate the learned representations on standard benchmarks for action recognition, and show that (i) CV-MIM performs competitively compared with the state-of-the-art models in the fully-supervised scenarios; (ii) CV-MIM outperforms other competing methods by a large margin in the single-shot cross-view setting; (iii) and the learned representations can significantly boost the performance when reducing the amount of supervised training data.

READ FULL TEXT

page 1

page 8

page 15

page 16

09/23/2022

View-Invariant Skeleton-based Action Recognition via Global-Local Contrastive Learning

Skeleton-based human action recognition has been drawing more interest r...
07/14/2020

Unsupervised Human 3D Pose Representation with Viewpoint and Pose Disentanglement

Learning a good 3D human pose representation is important for human pose...
09/17/2021

Unsupervised View-Invariant Human Posture Representation

Most recent view-invariant action recognition and performance assessment...
04/29/2021

3D Human Action Representation Learning via Cross-View Consistency Pursuit

In this work, we propose a Cross-view Contrastive Learning framework for...
04/27/2022

Human-Centered Prior-Guided and Task-Dependent Multi-Task Representation Learning for Action Recognition Pre-Training

Recently, much progress has been made for self-supervised action recogni...
02/02/2016

Learning a Deep Model for Human Action Recognition from Novel Viewpoints

Recognizing human actions from unknown and unseen (novel) views is a cha...
04/01/2016

Learning a Pose Lexicon for Semantic Action Recognition

This paper presents a novel method for learning a pose lexicon comprisin...