Interactive Spatiotemporal Token Attention Network for Skeleton-based General Interactive Action Recognition

07/14/2023
by   Yuhang Wen, et al.
0

Recognizing interactive action plays an important role in human-robot interaction and collaboration. Previous methods use late fusion and co-attention mechanism to capture interactive relations, which have limited learning capability or inefficiency to adapt to more interacting entities. With assumption that priors of each entity are already known, they also lack evaluations on a more general setting addressing the diversity of subjects. To address these problems, we propose an Interactive Spatiotemporal Token Attention Network (ISTA-Net), which simultaneously model spatial, temporal, and interactive relations. Specifically, our network contains a tokenizer to partition Interactive Spatiotemporal Tokens (ISTs), which is a unified way to represent motions of multiple diverse entities. By extending the entity dimension, ISTs provide better interactive representations. To jointly learn along three dimensions in ISTs, multi-head self-attention blocks integrated with 3D convolutions are designed to capture inter-token correlations. When modeling correlations, a strict entity ordering is usually irrelevant for recognizing interactive actions. To this end, Entity Rearrangement is proposed to eliminate the orderliness in ISTs for interchangeable entities. Extensive experiments on four datasets verify the effectiveness of ISTA-Net by outperforming state-of-the-art methods. Our code is publicly available at https://github.com/Necolizer/ISTA-Net

READ FULL TEXT

page 1

page 2

research
11/16/2022

UniRel: Unified Representation and Interaction for Joint Relational Triple Extraction

Relational triple extraction is challenging for its difficulty in captur...
research
07/07/2020

Decoupled Spatial-Temporal Attention Network for Skeleton-Based Action Recognition

Dynamic skeletal data, represented as the 2D/3D coordinates of human joi...
research
10/02/2020

LUKE: Deep Contextualized Entity Representations with Entity-aware Self-attention

Entity representations are useful in natural language tasks involving en...
research
07/25/2022

IGFormer: Interaction Graph Transformer for Skeleton-based Human Interaction Recognition

Human interaction recognition is very important in many applications. On...
research
12/18/2019

Self-Attention Network for Skeleton-based Human Action Recognition

Skeleton-based action recognition has recently attracted a lot of attent...
research
04/17/2023

Efficient Video Action Detection with Token Dropout and Context Refinement

Streaming video clips with large-scale video tokens impede vision transf...
research
04/10/2017

Formal approaches to a definition of agents

This thesis contributes to the formalisation of the notion of an agent w...

Please sign up or login with your details

Forgot password? Click here to reset