Learning by Aligning 2D Skeleton Sequences in Time

05/31/2023
by   Quoc-Huy Tran, et al.
0

This paper presents a novel self-supervised temporal video alignment framework which is useful for several fine-grained human activity understanding applications. In contrast with the state-of-the-art method of CASA, where sequences of 3D skeleton coordinates are taken directly as input, our key idea is to use sequences of 2D skeleton heatmaps as input. Given 2D skeleton heatmaps, we utilize a video transformer which performs self-attention in the spatial and temporal domains for extracting effective spatiotemporal and contextual features. In addition, we introduce simple heatmap augmentation techniques based on 2D skeletons for self-supervised learning. Despite the lack of 3D information, our approach achieves not only higher accuracy but also better robustness against missing and noisy keypoints than CASA. Extensive evaluations on three public datasets, i.e., Penn Action, IKEA ASM, and H2O, demonstrate that our approach outperforms previous methods in different fine-grained human activity understanding tasks, i.e., phase classification, phase progression, video alignment, and fine-grained frame retrieval.

READ FULL TEXT

page 1

page 4

page 10

page 11

page 12

research
09/12/2023

Action Segmentation Using 2D Skeleton Heatmaps

This paper presents a 2D skeleton-based action segmentation method with ...
research
04/26/2022

Context-Aware Sequence Alignment using 4D Skeletal Augmentation

Temporal alignment of fine-grained human actions in videos is important ...
research
08/03/2017

Unsupervised Video Understanding by Reconciliation of Posture Similarities

Understanding human activity and being able to explain it in detail surp...
research
05/31/2023

Permutation-Aware Action Segmentation via Unsupervised Frame-to-Segment Alignment

This paper presents a novel transformer-based framework for unsupervised...
research
06/08/2023

Learning Fine-grained View-Invariant Representations from Unpaired Ego-Exo Videos via Temporal Alignment

The egocentric and exocentric viewpoints of a human activity look dramat...
research
09/11/2023

SCD-Net: Spatiotemporal Clues Disentanglement Network for Self-supervised Skeleton-based Action Recognition

Contrastive learning has achieved great success in skeleton-based action...
research
11/24/2022

Spatial Mixture-of-Experts

Many data have an underlying dependence on spatial location; it may be w...

Please sign up or login with your details

Forgot password? Click here to reset