CALICO: Self-Supervised Camera-LiDAR Contrastive Pre-training for BEV Perception

06/01/2023
by   Jiachen Sun, et al.
0

Perception is crucial in the realm of autonomous driving systems, where bird's eye view (BEV)-based architectures have recently reached state-of-the-art performance. The desirability of self-supervised representation learning stems from the expensive and laborious process of annotating 2D and 3D data. Although previous research has investigated pretraining methods for both LiDAR and camera-based 3D object detection, a unified pretraining framework for multimodal BEV perception is missing. In this study, we introduce CALICO, a novel framework that applies contrastive objectives to both LiDAR and camera backbones. Specifically, CALICO incorporates two stages: point-region contrast (PRC) and region-aware distillation (RAD). PRC better balances the region- and scene-level representation learning on the LiDAR modality and offers significant performance improvement compared to existing methods. RAD effectively achieves contrastive distillation on our self-trained teacher model. CALICO's efficacy is substantiated by extensive evaluations on 3D object detection and BEV map segmentation tasks, where it delivers significant performance improvements. Notably, CALICO outperforms the baseline method by 10.5 mAP. Moreover, CALICO boosts the robustness of multimodal 3D object detection against adversarial attacks and corruption. Additionally, our framework can be tailored to different backbones and heads, positioning it as a promising approach for multimodal BEV perception.

READ FULL TEXT
research
03/30/2022

Image-to-Lidar Self-Supervised Distillation for Autonomous Driving Data

Segmenting or detecting objects in sparse Lidar point clouds are two imp...
research
02/09/2022

Point-Level Region Contrast for Object Detection Pre-Training

In this work we present point-level region contrast, a self-supervised p...
research
03/27/2023

UniDistill: A Universal Cross-Modality Knowledge Distillation Framework for 3D Object Detection in Bird's-Eye View

In the field of 3D object detection for autonomous driving, the sensor p...
research
12/14/2022

MAELi – Masked Autoencoder for Large-Scale LiDAR Point Clouds

We show how the inherent, but often neglected, properties of large-scale...
research
09/24/2021

Dense Contrastive Visual-Linguistic Pretraining

Inspired by the success of BERT, several multimodal representation learn...
research
03/04/2021

SSTN: Self-Supervised Domain Adaptation Thermal Object Detection for Autonomous Driving

The sensibility and sensitivity of the environment play a decisive role ...
research
02/17/2023

Self-Supervised Representation Learning from Temporal Ordering of Automated Driving Sequences

Self-supervised feature learning enables perception systems to benefit f...

Please sign up or login with your details

Forgot password? Click here to reset