Capsules as viewpoint learners for human pose estimation

02/13/2023
by   Nicola Garau, et al.
0

The task of human pose estimation (HPE) deals with the ill-posed problem of estimating the 3D position of human joints directly from images and videos. In recent literature, most of the works tackle the problem mostly by using convolutional neural networks (CNNs), which are capable of achieving state-of-the-art results in most datasets. We show how most neural networks are not able to generalize well when the camera is subject to significant viewpoint changes. This behaviour emerges because CNNs lack the capability of modelling viewpoint equivariance, while they rather rely on viewpoint invariance, resulting in high data dependency. Recently, capsule networks (CapsNets) have been proposed in the multi-class classification field as a solution to the viewpoint equivariance issue, reducing both the size and complexity of both the training datasets and the network itself. In this work, we show how capsule networks can be adopted to achieve viewpoint equivariance in human pose estimation. We propose a novel end-to-end viewpoint-equivariant capsule autoencoder that employs a fast Variational Bayes routing and matrix capsules. We achieve state-of-the-art results for multiple tasks and datasets while retaining other desirable properties, such as greater generalization capabilities when changing viewpoints, lower data dependency and fast inference. Additionally, by modelling each joint as a capsule, the hierarchical and geometrical structure of the overall pose is retained in the feature space, independently from the viewpoint. We further test our network on multiple datasets, both in the RGB and depth domain, from seen and unseen viewpoints and in the viewpoint transfer task.

READ FULL TEXT

page 2

page 3

page 4

page 7

page 13

research
08/19/2021

DECA: Deep viewpoint-Equivariant human pose estimation using Capsule Autoencoders

Human Pose Estimation (HPE) aims at retrieving the 3D position of human ...
research
03/23/2016

Towards Viewpoint Invariant 3D Human Pose Estimation

We propose a viewpoint invariant model for 3D human pose estimation from...
research
09/18/2016

Learning camera viewpoint using CNN to improve 3D body pose estimation

The objective of this work is to estimate 3D human pose from a single RG...
research
09/13/2016

Crafting a multi-task CNN for viewpoint estimation

Convolutional Neural Networks (CNNs) were recently shown to provide stat...
research
07/15/2020

On the Capability of Neural Networks to Generalize to Unseen Category-Pose Combinations

Recognizing an object's category and pose lies at the heart of visual un...
research
03/20/2022

3D Human Pose Estimation Using Möbius Graph Convolutional Networks

3D human pose estimation is fundamental to understanding human behavior....
research
04/07/2020

Capsule Networks – A Probabilistic Perspective

'Capsule' models try to explicitly represent the poses of objects, enfor...

Please sign up or login with your details

Forgot password? Click here to reset