Spatio-Temporal Tuples Transformer for Skeleton-Based Action Recognition

01/08/2022
by   Helei Qiu, et al.
0

Capturing the dependencies between joints is critical in skeleton-based action recognition task. Transformer shows great potential to model the correlation of important joints. However, the existing Transformer-based methods cannot capture the correlation of different joints between frames, which the correlation is very useful since different body parts (such as the arms and legs in "long jump") between adjacent frames move together. Focus on this problem, A novel spatio-temporal tuples Transformer (STTFormer) method is proposed. The skeleton sequence is divided into several parts, and several consecutive frames contained in each part are encoded. And then a spatio-temporal tuples self-attention module is proposed to capture the relationship of different joints in consecutive frames. In addition, a feature aggregation module is introduced between non-adjacent frames to enhance the ability to distinguish similar actions. Compared with the state-of-the-art methods, our method achieves better performance on two large-scale datasets.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/14/2022

STAR-Transformer: A Spatio-temporal Cross Attention Transformer for Human Action Recognition

In action recognition, although the combination of spatio-temporal video...
research
02/04/2022

Towards To-a-T Spatio-Temporal Focus for Skeleton-Based Action Recognition

Graph Convolutional Networks (GCNs) have been widely used to model the h...
research
10/06/2022

Focal and Global Spatial-Temporal Transformer for Skeleton-based Action Recognition

Despite great progress achieved by transformer in various vision tasks, ...
research
04/14/2023

DeePoint: Pointing Recognition and Direction Estimation From A Fixed View

In this paper, we realize automatic visual recognition and direction est...
research
10/26/2021

IIP-Transformer: Intra-Inter-Part Transformer for Skeleton-Based Action Recognition

Recently, Transformer-based networks have shown great promise on skeleto...
research
11/01/2021

LSTA-Net: Long short-term Spatio-Temporal Aggregation Network for Skeleton-based Action Recognition

Modelling various spatio-temporal dependencies is the key to recognising...
research
12/14/2021

Temporal Transformer Networks with Self-Supervision for Action Recognition

In recent years, 2D Convolutional Networks-based video action recognitio...

Please sign up or login with your details

Forgot password? Click here to reset