A Simple Baseline for Multi-Camera 3D Object Detection

08/22/2022
by   Yunpeng Zhang, et al.
0

3D object detection with surrounding cameras has been a promising direction for autonomous driving. In this paper, we present SimMOD, a Simple baseline for Multi-camera Object Detection, to solve the problem. To incorporate multi-view information as well as build upon previous efforts on monocular 3D object detection, the framework is built on sample-wise object proposals and designed to work in a two-stage manner. First, we extract multi-scale features and generate the perspective object proposals on each monocular image. Second, the multi-view proposals are aggregated and then iteratively refined with multi-view and multi-scale visual features in the DETR3D-style. The refined proposals are end-to-end decoded into the detection results. To further boost the performance, we incorporate the auxiliary branches alongside the proposal generation to enhance the feature learning. Also, we design the methods of target filtering and teacher forcing to promote the consistency of two-stage training. We conduct extensive experiments on the 3D object detection benchmark of nuScenes to demonstrate the effectiveness of SimMOD and achieve new state-of-the-art performance. Code will be available at https://github.com/zhangyp15/SimMOD.

READ FULL TEXT

page 2

page 3

page 8

research
06/02/2021

ImVoxelNet: Image to Voxels Projection for Monocular and Multi-View General-Purpose 3D Object Detection

In this paper, we introduce the task of multi-view RGB-based 3D object d...
research
03/25/2023

Viewpoint Equivariance for Multi-View 3D Object Detection

3D object detection from visual sensors is a cornerstone capability of r...
research
04/25/2022

Graph-DETR3D: Rethinking Overlapping Regions for Multi-View 3D Object Detection

3D object detection from multiple image views is a fundamental and chall...
research
12/03/2022

IDMS: Instance Depth for Multi-scale Monocular 3D Object Detection

Due to the lack of depth information of images and poor detection accura...
research
06/16/2023

PanoOcc: Unified Occupancy Representation for Camera-based 3D Panoptic Segmentation

Comprehensive modeling of the surrounding 3D world is key to the success...
research
02/16/2023

3M3D: Multi-view, Multi-path, Multi-representation for 3D Object Detection

3D visual perception tasks based on multi-camera images are essential fo...
research
04/02/2018

Multi-scale Location-aware Kernel Representation for Object Detection

Although Faster R-CNN and its variants have shown promising performance ...

Please sign up or login with your details

Forgot password? Click here to reset