Action Transformer: A Self-Attention Model for Short-Time Human Action Recognition

07/01/2021
by   Vittorio Mazzia, et al.
1

Deep neural networks based purely on attention have been successful across several domains, relying on minimal architectural priors from the designer. In Human Action Recognition (HAR), attention mechanisms have been primarily adopted on top of standard convolutional or recurrent layers, improving the overall generalization capability. In this work, we introduce Action Transformer (AcT), a simple, fully self-attentional architecture that consistently outperforms more elaborated networks that mix convolutional, recurrent, and attentive layers. In order to limit computational and energy requests, building on previous human action recognition research, the proposed approach exploits 2D pose representations over small temporal windows, providing a low latency solution for accurate and effective real-time performance. Moreover, we open-source MPOSE2021, a new large-scale dataset, as an attempt to build a formal training and evaluation benchmark for real-time short-time human action recognition. Extensive experimentation on MPOSE2021 with our proposed methodology and several previous architectural solutions proves the effectiveness of the AcT model and poses the base for future work on HAR.

READ FULL TEXT

page 1

page 7

page 8

research
03/19/2022

DirecFormer: A Directed Attention in Transformer Approach to Robust Action Recognition

Human action recognition has recently become one of the popular research...
research
12/19/2021

Precondition and Effect Reasoning for Action Recognition

Human action recognition has drawn a lot of attention in the recent year...
research
04/01/2021

Keyword Transformer: A Self-Attention Model for Keyword Spotting

The Transformer architecture has been successful across many domains, in...
research
12/20/2017

Human Action Recognition: Pose-based Attention draws focus to Hands

We propose a new spatio-temporal attention based mechanism for human act...
research
10/29/2018

ActionXPose: A Novel 2D Multi-view Pose-based Algorithm for Real-time Human Action Recognition

We present ActionXPose, a novel 2D pose-based algorithm for posture-leve...
research
06/30/2023

SpATr: MoCap 3D Human Action Recognition based on Spiral Auto-encoder and Transformer Network

Recent advancements in technology have expanded the possibilities of hum...
research
04/17/2021

Higher Order Recurrent Space-Time Transformer

Endowing visual agents with predictive capability is a key step towards ...

Please sign up or login with your details

Forgot password? Click here to reset