DeepAI AI Chat
Log In Sign Up

Spatiotemporal Transformer Attention Network for 3D Voxel Level Joint Segmentation and Motion Prediction in Point Cloud

by   Zhensong Wei, et al.

Environment perception including detection, classification, tracking, and motion prediction are key enablers for automated driving systems and intelligent transportation applications. Fueled by the advances in sensing technologies and machine learning techniques, LiDAR-based sensing systems have become a promising solution. The current challenges of this solution are how to effectively combine different perception tasks into a single backbone and how to efficiently learn the spatiotemporal features directly from point cloud sequences. In this research, we propose a novel spatiotemporal attention network based on a transformer self-attention mechanism for joint semantic segmentation and motion prediction within a point cloud at the voxel level. The network is trained to simultaneously outputs the voxel level class and predicted motion by learning directly from a sequence of point cloud datasets. The proposed backbone includes both a temporal attention module (TAM) and a spatial attention module (SAM) to learn and extract the complex spatiotemporal features. This approach has been evaluated with the nuScenes dataset, and promising performance has been achieved.


Spatial-Temporal Transformer for 3D Point Cloud Sequences

Effective learning of spatial-temporal information within a point cloud ...

Learning 3D Semantics from Pose-Noisy 2D Images with Hierarchical Full Attention Network

We propose a novel framework to learn 3D point cloud semantics from 2D m...

S3Net: 3D LiDAR Sparse Semantic Segmentation Network

Semantic Segmentation is a crucial component in the perception systems o...

Learning Spatial and Temporal Variations for 4D Point Cloud Segmentation

LiDAR-based 3D scene perception is a fundamental and important task for ...

Cross-Level Cross-Scale Cross-Attention Network for Point Cloud Representation

Self-attention mechanism recently achieves impressive advancement in Nat...

LiDAR-based Online 3D Video Object Detection with Graph-based Message Passing and Spatiotemporal Transformer Attention

Existing LiDAR-based 3D object detectors usually focus on the single-fra...

Point Primitive Transformer for Long-Term 4D Point Cloud Video Understanding

This paper proposes a 4D backbone for long-term point cloud video unders...