X^3KD: Knowledge Distillation Across Modalities, Tasks and Stages for Multi-Camera 3D Object Detection

03/03/2023
by   Marvin Klingner, et al.
0

Recent advances in 3D object detection (3DOD) have obtained remarkably strong results for LiDAR-based models. In contrast, surround-view 3DOD models based on multiple camera images underperform due to the necessary view transformation of features from perspective view (PV) to a 3D world representation which is ambiguous due to missing depth information. This paper introduces X^3KD, a comprehensive knowledge distillation framework across different modalities, tasks, and stages for multi-camera 3DOD. Specifically, we propose cross-task distillation from an instance segmentation teacher (X-IS) in the PV feature extraction stage providing supervision without ambiguous error backpropagation through the view transformation. After the transformation, we apply cross-modal feature distillation (X-FD) and adversarial training (X-AT) to improve the 3D world representation of multi-camera features through the information contained in a LiDAR-based 3DOD teacher. Finally, we also employ this teacher for cross-modal output distillation (X-OD), providing dense supervision at the prediction stage. We perform extensive ablations of knowledge distillation at different stages of multi-camera 3DOD. Our final X^3KD model outperforms previous state-of-the-art approaches on the nuScenes and Waymo datasets and generalizes to RADAR-based 3DOD. Qualitative results video at https://youtu.be/1do9DPFmr38.

READ FULL TEXT

page 1

page 4

page 5

page 6

page 7

page 8

research
03/27/2023

UniDistill: A Universal Cross-Modality Knowledge Distillation Framework for 3D Object Detection in Bird's-Eye View

In the field of 3D object detection for autonomous driving, the sensor p...
research
11/17/2022

BEVDistill: Cross-Modal BEV Distillation for Multi-View 3D Object Detection

3D object detection from multiple image views is a fundamental and chall...
research
08/08/2021

Learning an Augmented RGB Representation with Cross-Modal Knowledge Distillation for Action Detection

In video understanding, most cross-modal knowledge distillation (KD) met...
research
04/19/2023

CrossFusion: Interleaving Cross-modal Complementation for Noise-resistant 3D Object Detection

The combination of LiDAR and camera modalities is proven to be necessary...
research
09/20/2023

Dense 2D-3D Indoor Prediction with Sound via Aligned Cross-Modal Distillation

Sound can convey significant information for spatial reasoning in our da...
research
04/01/2020

Creating Something from Nothing: Unsupervised Knowledge Distillation for Cross-Modal Hashing

In recent years, cross-modal hashing (CMH) has attracted increasing atte...
research
06/28/2023

A Dimensional Structure based Knowledge Distillation Method for Cross-Modal Learning

Due to limitations in data quality, some essential visual tasks are diff...

Please sign up or login with your details

Forgot password? Click here to reset