UNIK: A Unified Framework for Real-world Skeleton-based Action Recognition

07/19/2021
by   Di Yang, et al.
8

Action recognition based on skeleton data has recently witnessed increasing attention and progress. State-of-the-art approaches adopting Graph Convolutional networks (GCNs) can effectively extract features on human skeletons relying on the pre-defined human topology. Despite associated progress, GCN-based methods have difficulties to generalize across domains, especially with different human topological structures. In this context, we introduce UNIK, a novel skeleton-based action recognition method that is not only effective to learn spatio-temporal features on human skeleton sequences but also able to generalize across datasets. This is achieved by learning an optimal dependency matrix from the uniform distribution based on a multi-head attention mechanism. Subsequently, to study the cross-domain generalizability of skeleton-based action recognition in real-world videos, we re-evaluate state-of-the-art approaches as well as the proposed UNIK in light of a novel Posetics dataset. This dataset is created from Kinetics-400 videos by estimating, refining and filtering poses. We provide an analysis on how much performance improves on smaller benchmark datasets after pre-training on Posetics for the action classification task. Experimental results show that the proposed UNIK, with pre-training on Posetics, generalizes well and outperforms state-of-the-art when transferred onto four target action classification datasets: Toyota Smarthome, Penn Action, NTU-RGB+D 60 and NTU-RGB+D 120.

READ FULL TEXT

page 1

page 6

page 13

page 14

research
11/11/2020

Progressive Spatio-Temporal Graph Convolutional Network for Skeleton-Based Human Action Recognition

Graph convolutional networks (GCNs) have been very successful in skeleto...
research
05/31/2022

Skeleton-based Action Recognition via Temporal-Channel Aggregation

Skeleton-based action recognition methods are limited by the semantic ex...
research
05/22/2017

Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset

The paucity of videos in current action classification datasets (UCF-101...
research
08/07/2023

ViLP: Knowledge Exploration using Vision, Language, and Pose Embeddings for Video Action Recognition

Video Action Recognition (VAR) is a challenging task due to its inherent...
research
11/10/2020

Selective Spatio-Temporal Aggregation Based Pose Refinement System: Towards Understanding Human Activities in Real-World Videos

Taking advantage of human pose data for understanding human activities h...
research
04/14/2023

Skeleton-based action analysis for ADHD diagnosis

Attention Deficit Hyperactivity Disorder (ADHD) is a common neurobehavio...
research
07/17/2022

Learning from Temporal Spatial Cubism for Cross-Dataset Skeleton-based Action Recognition

Rapid progress and superior performance have been achieved for skeleton-...

Please sign up or login with your details

Forgot password? Click here to reset