Log In Sign Up

Attention-Driven Body Pose Encoding for Human Activity Recognition

by   B Debnath, et al.

This article proposes a novel attention-based body pose encoding for human activity recognition that presents a enriched representation of body-pose that is learned. The enriched data complements the 3D body joint position data and improves model performance. In this paper, we propose a novel approach that learns enhanced feature representations from a given sequence of 3D body joints. To achieve this encoding, the approach exploits 1) a spatial stream which encodes the spatial relationship between various body joints at each time point to learn spatial structure involving the spatial distribution of different body joints 2) a temporal stream that learns the temporal variation of individual body joints over the entire sequence duration to present a temporally enhanced representation. Afterwards, these two pose streams are fused with a multi-head attention mechanism. translation. We also capture the contextual information from the RGB video stream using a Inception-ResNet-V2 model combined with a multi-head attention and a bidirectional Long Short-Term Memory (LSTM) network. performance is enhanced through the multi-head attention mechanism. Finally, the RGB video stream is combined with the fused body pose stream to give a novel end-to-end deep model for effective human activity recognition.


page 1

page 2

page 3

page 4


Pose-conditioned Spatio-Temporal Attention for Human Action Recognition

We address human action recognition from multi-modal video data involvin...

Federated Multi-task Hierarchical Attention Model for Sensor Analytics

Sensors are an integral part of modern Internet of Things (IoT) applicat...

Coarse Temporal Attention Network (CTA-Net) for Driver's Activity Recognition

There is significant progress in recognizing traditional human activitie...

Follow the Attention: Combining Partial Pose and Object Motion for Fine-Grained Action Detection

Activity recognition in shopping environments is an important and challe...

Recurrent Network Models for Human Dynamics

We propose the Encoder-Recurrent-Decoder (ERD) model for recognition and...

Human Interaction Recognition Framework based on Interacting Body Part Attention

Human activity recognition in videos has been widely studied and has rec...