Skeleton Sequence and RGB Frame Based Multi-Modality Feature Fusion Network for Action Recognition

02/23/2022
by   Xiaoguang Zhu, et al.
1

Action recognition has been a heated topic in computer vision for its wide application in vision systems. Previous approaches achieve improvement by fusing the modalities of the skeleton sequence and RGB video. However, such methods have a dilemma between the accuracy and efficiency for the high complexity of the RGB video network. To solve the problem, we propose a multi-modality feature fusion network to combine the modalities of the skeleton sequence and RGB frame instead of the RGB video, as the key information contained by the combination of skeleton sequence and RGB frame is close to that of the skeleton sequence and RGB video. In this way, the complementary information is retained while the complexity is reduced by a large margin. To better explore the correspondence of the two modalities, a two-stage fusion framework is introduced in the network. In the early fusion stage, we introduce a skeleton attention module that projects the skeleton sequence on the single RGB frame to help the RGB frame focus on the limb movement regions. In the late fusion stage, we propose a cross-attention module to fuse the skeleton feature and the RGB feature by exploiting the correlation. Experiments on two benchmarks NTU RGB+D and SYSU show that the proposed model achieves competitive performance compared with the state-of-the-art methods while reduces the complexity of the network.

READ FULL TEXT

page 1

page 7

page 10

page 11

page 19

page 20

research
04/29/2020

Skeleton Focused Human Activity Recognition in RGB Video

The data-driven approach that learns an optimal representation of vision...
research
11/19/2021

Action Recognition with Domain Invariant Features of Skeleton Image

Due to the fast processing-speed and robustness it can achieve, skeleton...
research
02/28/2020

Infrared and 3D skeleton feature fusion for RGB-D action recognition

A challenge of skeleton-based action recognition is the difficulty to cl...
research
04/20/2019

EV-Action: Electromyography-Vision Multi-Modal Action Dataset

Multi-modal human motion analysis is a critical and attractive research ...
research
07/07/2023

Physical-aware Cross-modal Adversarial Network for Wearable Sensor-based Human Action Recognition

Wearable sensor-based Human Action Recognition (HAR) has made significan...
research
01/05/2021

Trear: Transformer-based RGB-D Egocentric Action Recognition

In this paper, we propose a Transformer-based RGB-D egocentric action re...
research
09/21/2023

Elevating Skeleton-Based Action Recognition with Efficient Multi-Modality Self-Supervision

Self-supervised representation learning for human action recognition has...

Please sign up or login with your details

Forgot password? Click here to reset