Depth-Aware Action Recognition: Pose-Motion Encoding through Temporal Heatmaps

11/26/2020
by   Mattia Segù, et al.
12

Most state-of-the-art methods for action recognition rely only on 2D spatial features encoding appearance, motion or pose. However, 2D data lacks the depth information, which is crucial for recognizing fine-grained actions. In this paper, we propose a depth-aware volumetric descriptor that encodes pose and motion information in a unified representation for action classification in-the-wild. Our framework is robust to many challenges inherent to action recognition, e.g. variation in viewpoint, scene, clothing and body shape. The key component of our method is the Depth-Aware Pose Motion representation (DA-PoTion), a new video descriptor that encodes the 3D movement of semantic keypoints of the human body. Given a video, we produce human joint heatmaps for each frame using a state-of-the-art 3D human pose regressor and we give each of them a unique color code according to the relative time in the clip. Then, we aggregate such 3D time-encoded heatmaps for all human joints to obtain a fixed-size descriptor (DA-PoTion), which is suitable for classifying actions using a shallow 3D convolutional neural network (CNN). The DA-PoTion alone defines a new state-of-the-art on the Penn Action Dataset. Moreover, we leverage the intrinsic complementarity of our pose motion descriptor with appearance based approaches by combining it with Inflated 3D ConvNet (I3D) to define a new state-of-the-art on the JHMDB Dataset.

READ FULL TEXT

page 4

page 5

page 7

page 8

research
06/11/2015

P-CNN: Pose-based CNN Features for Action Recognition

This work targets human action recognition in video. While recent method...
research
01/29/2018

Histogram of Oriented Depth Gradients for Action Recognition

In this paper, we report on experiments with the use of local measures f...
research
11/20/2019

A Human Action Descriptor Based on Motion Coordination

In this paper, we present a descriptor for human whole-body actions base...
research
12/04/2017

Robust 3D Action Recognition through Sampling Local Appearances and Global Distributions

3D action recognition has broad applications in human-computer interacti...
research
11/23/2022

Dynamic Appearance: A Video Representation for Action Recognition with Joint Training

Static appearance of video may impede the ability of a deep neural netwo...
research
10/10/2017

Real-Time Action Detection in Video Surveillance using Sub-Action Descriptor with Multi-CNN

When we say a person is texting, can you tell the person is walking or s...
research
08/18/2023

Human Part-wise 3D Motion Context Learning for Sign Language Recognition

In this paper, we propose P3D, the human part-wise motion context learni...

Please sign up or login with your details

Forgot password? Click here to reset