Robust 3D Action Recognition through Sampling Local Appearances and Global Distributions

12/04/2017
by   Mengyuan Liu, et al.
0

3D action recognition has broad applications in human-computer interaction and intelligent surveillance. However, recognizing similar actions remains challenging since previous literature fails to capture motion and shape cues effectively from noisy depth data. In this paper, we propose a novel two-layer Bag-of-Visual-Words (BoVW) model, which suppresses the noise disturbances and jointly encodes both motion and shape cues. First, background clutter is removed by a background modeling method that is designed for depth data. Then, motion and shape cues are jointly used to generate robust and distinctive spatial-temporal interest points (STIPs): motion-based STIPs and shape-based STIPs. In the first layer of our model, a multi-scale 3D local steering kernel (M3DLSK) descriptor is proposed to describe local appearances of cuboids around motion-based STIPs. In the second layer, a spatial-temporal vector (STV) descriptor is proposed to describe the spatial-temporal distributions of shape-based STIPs. Using the Bag-of-Visual-Words (BoVW) model, motion and shape cues are combined to form a fused action representation. Our model performs favorably compared with common STIP detection and description methods. Thorough experiments verify that our model is effective in distinguishing similar actions and robust to background clutter, partial occlusions and pepper noise.

READ FULL TEXT

page 2

page 3

page 4

page 5

page 6

page 9

page 11

page 12

research
07/22/2020

Video-ception Network: Towards Multi-Scale Efficient Asymmetric Spatial-Temporal Interactions

Previous video modeling methods leverage the cubic 3D convolution filter...
research
04/03/2017

Chained Multi-stream Networks Exploiting Pose, Motion, and Appearance for Action Classification and Detection

General human action recognition requires understanding of various visua...
research
11/26/2020

Depth-Aware Action Recognition: Pose-Motion Encoding through Temporal Heatmaps

Most state-of-the-art methods for action recognition rely only on 2D spa...
research
05/10/2013

Shape Reconstruction and Recognition with Isolated Non-directional Cues

The paper investigates a hypothesis that our visual system groups visual...
research
08/27/2016

Spatio-temporal Aware Non-negative Component Representation for Action Recognition

This paper presents a novel mid-level representation for action recognit...
research
01/18/2017

Action Recognition: From Static Datasets to Moving Robots

Deep learning models have achieved state-of-the- art performance in reco...
research
08/29/2019

DWnet: Deep-Wide Network for 3D Action Recognition

We propose in this paper a deep-wide network (DWnet) which combines the ...

Please sign up or login with your details

Forgot password? Click here to reset