CNN based Multistage Gated Average Fusion (MGAF) for Human Action Recognition Using Depth and Inertial Sensors

10/29/2020
by   Zeeshan Ahmad, et al.
0

Convolutional Neural Network (CNN) provides leverage to extract and fuse features from all layers of its architecture. However, extracting and fusing intermediate features from different layers of CNN structure is still uninvestigated for Human Action Recognition (HAR) using depth and inertial sensors. To get maximum benefit of accessing all the CNN's layers, in this paper, we propose novel Multistage Gated Average Fusion (MGAF) network which extracts and fuses features from all layers of CNN using our novel and computationally efficient Gated Average Fusion (GAF) network, a decisive integral element of MGAF. At the input of the proposed MGAF, we transform the depth and inertial sensor data into depth images called sequential front view images (SFI) and signal images (SI) respectively. These SFI are formed from the front view information generated by depth data. CNN is employed to extract feature maps from both input modalities. GAF network fuses the extracted features effectively while preserving the dimensionality of fused feature as well. The proposed MGAF network has structural extensibility and can be unfolded to more than two modalities. Experiments on three publicly available multimodal HAR datasets demonstrate that the proposed MGAF outperforms the previous state of the art fusion methods for depth-inertial HAR in terms of recognition accuracy while being computationally much more efficient. We increase the accuracy by an average of 1.5 percent while reducing the computational cost by approximately 50 percent over the previous state of the art.

READ FULL TEXT

page 1

page 5

research
08/22/2020

Towards Improved Human Action Recognition Using Convolutional Neural Networks and Multimodal Fusion of Depth and Inertial Sensor Data

This paper attempts at improving the accuracy of Human Action Recognitio...
research
10/25/2019

Human Action Recognition Using Deep Multilevel Multimodal (M2) Fusion of Depth and Inertial Sensors

Multimodal fusion frameworks for Human Action Recognition (HAR) using de...
research
05/28/2021

Inertial Sensor Data To Image Encoding For Human Action Recognition

Convolutional Neural Networks (CNNs) are successful deep learning models...
research
02/17/2020

DeepDualMapper: A Gated Fusion Network for Automatic Map Extraction using Aerial Images and Trajectories

Automatic map extraction is of great importance to urban computing and l...
research
11/10/2015

TemplateNet for Depth-Based Object Instance Recognition

We present a novel deep architecture termed templateNet for depth based ...
research
03/13/2020

Gimme Signals: Discriminative signal encoding for multimodal activity recognition

We present a simple, yet effective and flexible method for action recogn...
research
08/07/2016

Multiview Cauchy Estimator Feature Embedding for Depth and Inertial Sensor-Based Human Action Recognition

The ever-growing popularity of Kinect and inertial sensors has prompted ...

Please sign up or login with your details

Forgot password? Click here to reset