Spatial-Temporal Transformer for 3D Point Cloud Sequences

10/19/2021
by   Yimin Wei, et al.
0

Effective learning of spatial-temporal information within a point cloud sequence is highly important for many down-stream tasks such as 4D semantic segmentation and 3D action recognition. In this paper, we propose a novel framework named Point Spatial-Temporal Transformer (PST2) to learn spatial-temporal representations from dynamic 3D point cloud sequences. Our PST2 consists of two major modules: a Spatio-Temporal Self-Attention (STSA) module and a Resolution Embedding (RE) module. Our STSA module is introduced to capture the spatial-temporal context information across adjacent frames, while the RE module is proposed to aggregate features across neighbors to enhance the resolution of feature maps. We test the effectiveness our PST2 with two different tasks on point cloud sequences, i.e., 4D semantic segmentation and 3D action recognition. Extensive experiments on three benchmarks show that our PST2 outperforms existing methods on all datasets. The effectiveness of our STSA and RE modules have also been justified with ablation experiments.

READ FULL TEXT
research
05/27/2022

PSTNet: Point Spatio-Temporal Convolution on Point Cloud Sequences

Point cloud sequences are irregular and unordered in the spatial dimensi...
research
12/20/2020

Anchor-Based Spatial-Temporal Attention Convolutional Networks for Dynamic 3D Point Cloud Sequences

Recently, learning based methods for the robot perception from the image...
research
11/16/2021

SequentialPointNet: A strong parallelized point cloud sequence network for 3D action recognition

Point cloud sequences of 3D human actions exhibit unordered intra-frame ...
research
09/01/2022

MAPLE: Masked Pseudo-Labeling autoEncoder for Semi-supervised Point Cloud Action Recognition

Recognizing human actions from point cloud videos has attracted tremendo...
research
07/30/2022

Point Primitive Transformer for Long-Term 4D Point Cloud Video Understanding

This paper proposes a 4D backbone for long-term point cloud video unders...
research
02/28/2022

Spatiotemporal Transformer Attention Network for 3D Voxel Level Joint Segmentation and Motion Prediction in Point Cloud

Environment perception including detection, classification, tracking, an...
research
01/17/2022

Action Keypoint Network for Efficient Video Recognition

Reducing redundancy is crucial for improving the efficiency of video rec...

Please sign up or login with your details

Forgot password? Click here to reset