View-Invariant Skeleton-based Action Recognition via Global-Local Contrastive Learning

by   Cunling Bian, et al.
Tianjin University

Skeleton-based human action recognition has been drawing more interest recently due to its low sensitivity to appearance changes and the accessibility of more skeleton data. However, even the 3D skeletons captured in practice are still sensitive to the viewpoint and direction gave the occlusion of different human-body joints and the errors in human joint localization. Such view variance of skeleton data may significantly affect the performance of action recognition. To address this issue, we propose in this paper a new view-invariant representation learning approach, without any manual action labeling, for skeleton-based human action recognition. Specifically, we leverage the multi-view skeleton data simultaneously taken for the same person in the network training, by maximizing the mutual information between the representations extracted from different views, and then propose a global-local contrastive loss to model the multi-scale co-occurrence relationships in both spatial and temporal domains. Extensive experimental results show that the proposed method is robust to the view difference of the input skeleton data and significantly boosts the performance of unsupervised skeleton-based human action methods, resulting in new state-of-the-art accuracies on two challenging multi-view benchmarks of PKUMMD and NTU RGB+D.


Cross-view Action Recognition via Contrastive View-invariant Representation

Cross view action recognition (CVAR) seeks to recognize a human action w...

A Large-scale Varying-view RGB-D Action Dataset for Arbitrary-view Human Action Recognition

Current researches of action recognition mainly focus on single-view and...

Learning View-Disentangled Human Pose Representation by Contrastive Cross-View Mutual Information Maximization

We introduce a novel representation learning method to disentangle pose-...

Local Spherical Harmonics Improve Skeleton-Based Hand Action Recognition

Hand action recognition is essential. Communication, human-robot interac...

Improving Video Violence Recognition with Human Interaction Learning on 3D Skeleton Point Clouds

Deep learning has proved to be very effective in video action recognitio...

Shifting Perspective to See Difference: A Novel Multi-View Method for Skeleton based Action Recognition

Skeleton-based human action recognition is a longstanding challenge due ...

View Adaptive Neural Networks for High Performance Skeleton-based Human Action Recognition

Skeleton-based human action recognition has recently attracted increasin...

Please sign up or login with your details

Forgot password? Click here to reset