Two-stream Multi-level Dynamic Point Transformer for Two-person Interaction Recognition

07/22/2023
by   Yao Liu, et al.
1

As a fundamental aspect of human life, two-person interactions contain meaningful information about people's activities, relationships, and social settings. Human action recognition serves as the foundation for many smart applications, with a strong focus on personal privacy. However, recognizing two-person interactions poses more challenges due to increased body occlusion and overlap compared to single-person actions. In this paper, we propose a point cloud-based network named Two-stream Multi-level Dynamic Point Transformer for two-person interaction recognition. Our model addresses the challenge of recognizing two-person interactions by incorporating local-region spatial information, appearance information, and motion information. To achieve this, we introduce a designed frame selection method named Interval Frame Sampling (IFS), which efficiently samples frames from videos, capturing more discriminative information in a relatively short processing time. Subsequently, a frame features learning module and a two-stream multi-level feature aggregation module extract global and partial features from the sampled frames, effectively representing the local-region spatial information, appearance information, and motion information related to the interactions. Finally, we apply a transformer to perform self-attention on the learned features for the final classification. Extensive experiments are conducted on two large-scale datasets, the interaction subsets of NTU RGB+D 60 and NTU RGB+D 120. The results show that our network outperforms state-of-the-art approaches across all standard evaluation settings.

READ FULL TEXT

page 3

page 15

research
11/16/2021

SequentialPointNet: A strong parallelized point cloud sequence network for 3D action recognition

Point cloud sequences of 3D human actions exhibit unordered intra-frame ...
research
08/17/2020

Spatial Temporal Transformer Network for Skeleton-based Action Recognition

Skeleton-based Human Activity Recognition has achieved a great interest ...
research
08/19/2022

SoMoFormer: Social-Aware Motion Transformer for Multi-Person Motion Prediction

Multi-person motion prediction remains a challenging problem, especially...
research
09/06/2022

Real-Time Cattle Interaction Recognition via Triple-stream Network

In stockbreeding of beef cattle, computer vision-based approaches have b...
research
10/11/2019

Interaction Relational Network for Mutual Action Recognition

Person-person mutual action recognition (also referred to as interaction...
research
02/10/2020

Joint Encoding of Appearance and Motion Features with Self-supervision for First Person Action Recognition

Wearable cameras are becoming more and more popular in several applicati...
research
03/15/2016

First Person Action-Object Detection with EgoNet

Unlike traditional third-person cameras mounted on robots, a first-perso...

Please sign up or login with your details

Forgot password? Click here to reset