Hierarchically Self-Supervised Transformer for Human Skeleton Representation Learning

07/20/2022
by   Yuxiao Chen, et al.
0

Despite the success of fully-supervised human skeleton sequence modeling, utilizing self-supervised pre-training for skeleton sequence representation learning has been an active field because acquiring task-specific skeleton annotations at large scales is difficult. Recent studies focus on learning video-level temporal and discriminative information using contrastive learning, but overlook the hierarchical spatial-temporal nature of human skeletons. Different from such superficial supervision at the video level, we propose a self-supervised hierarchical pre-training scheme incorporated into a hierarchical Transformer-based skeleton sequence encoder (Hi-TRS), to explicitly capture spatial, short-term, and long-term temporal dependencies at frame, clip, and video levels, respectively. To evaluate the proposed self-supervised pre-training scheme with Hi-TRS, we conduct extensive experiments covering three skeleton-based downstream tasks including action recognition, action detection, and motion prediction. Under both supervised and semi-supervised evaluation protocols, our method achieves the state-of-the-art performance. Additionally, we demonstrate that the prior knowledge learned by our model in the pre-training stage has strong transfer capability for different downstream tasks.

READ FULL TEXT
research
08/08/2021

Skeleton-Contrastive 3D Action Representation Learning

This paper strives for self-supervised learning of a feature space suita...
research
08/14/2023

Masked Motion Predictors are Strong 3D Action Representation Learners

In 3D human action recognition, limited supervised data makes it challen...
research
11/19/2022

Efficient Video Representation Learning via Masked Video Modeling with Motion-centric Token Selection

Self-supervised Video Representation Learning (VRL) aims to learn transf...
research
06/13/2021

InfoBehavior: Self-supervised Representation Learning for Ultra-long Behavior Sequence via Hierarchical Grouping

E-commerce companies have to face abnormal sellers who sell potentially-...
research
11/10/2022

Contrastive Self-Supervised Learning for Skeleton Representations

Human skeleton point clouds are commonly used to automatically classify ...
research
04/27/2022

Human-Centered Prior-Guided and Task-Dependent Multi-Task Representation Learning for Action Recognition Pre-Training

Recently, much progress has been made for self-supervised action recogni...
research
06/29/2023

Foundation Model for Endoscopy Video Analysis via Large-scale Self-supervised Pre-train

Foundation models have exhibited remarkable success in various applicati...

Please sign up or login with your details

Forgot password? Click here to reset