ContrastMotion: Self-supervised Scene Motion Learning for Large-Scale LiDAR Point Clouds
In this paper, we propose a novel self-supervised motion estimator for LiDAR-based autonomous driving via BEV representation. Different from usually adopted self-supervised strategies for data-level structure consistency, we predict scene motion via feature-level consistency between pillars in consecutive frames, which can eliminate the effect caused by noise points and view-changing point clouds in dynamic scenes. Specifically, we propose Soft Discriminative Loss that provides the network with more pseudo-supervised signals to learn discriminative and robust features in a contrastive learning manner. We also propose Gated Multi-frame Fusion block that learns valid compensation between point cloud frames automatically to enhance feature extraction. Finally, pillar association is proposed to predict pillar correspondence probabilities based on feature distance, and whereby further predicts scene motion. Extensive experiments show the effectiveness and superiority of our ContrastMotion on both scene flow and motion prediction tasks. The code is available soon.
READ FULL TEXT