Depth Is All You Need for Monocular 3D Detection

10/05/2022
by   Dennis Park, et al.
8

A key contributor to recent progress in 3D detection from single images is monocular depth estimation. Existing methods focus on how to leverage depth explicitly, by generating pseudo-pointclouds or providing attention cues for image features. More recent works leverage depth prediction as a pretraining task and fine-tune the depth representation while training it for 3D detection. However, the adaptation is insufficient and is limited in scale by manual labels. In this work, we propose to further align depth representation with the target domain in unsupervised fashions. Our methods leverage commonly available LiDAR or RGB videos during training time to fine-tune the depth representation, which leads to improved 3D detectors. Especially when using RGB videos, we show that our two-stage training by first generating pseudo-depth labels is critical because of the inconsistency in loss distribution between the two tasks. With either type of reference data, our multi-task learning approach improves over the state of the art on both KITTI and NuScenes, while matching the test-time complexity of its single task sub-network.

READ FULL TEXT

page 2

page 5

research
08/13/2021

Is Pseudo-Lidar needed for Monocular 3D Object detection?

Recent progress in 3D object detection from single images leverages mono...
research
07/28/2021

Pseudo-LiDAR Based Road Detection

Road detection is a critically important task for self-driving cars. By ...
research
10/29/2022

Boosting Monocular 3D Object Detection with Object-Centric Auxiliary Depth Supervision

Recent advances in monocular 3D detection leverage a depth estimation ne...
research
08/21/2022

Multi-task Learning for Monocular Depth and Defocus Estimations with Real Images

Monocular depth estimation and defocus estimation are two fundamental ta...
research
07/27/2023

Learning Depth Estimation for Transparent and Mirror Surfaces

Inferring the depth of transparent or mirror (ToM) surfaces represents a...
research
05/10/2023

A Multi-modal Approach to Single-modal Visual Place Classification

Visual place classification from a first-person-view monocular RGB image...
research
05/15/2022

Promoting Saliency From Depth: Deep Unsupervised RGB-D Saliency Detection

Growing interests in RGB-D salient object detection (RGB-D SOD) have bee...

Please sign up or login with your details

Forgot password? Click here to reset