CAPE: Camera View Position Embedding for Multi-View 3D Object Detection

03/17/2023
by   Kaixin Xiong, et al.
0

In this paper, we address the problem of detecting 3D objects from multi-view images. Current query-based methods rely on global 3D position embeddings (PE) to learn the geometric correspondence between images and 3D space. We claim that directly interacting 2D image features with global 3D PE could increase the difficulty of learning view transformation due to the variation of camera extrinsics. Thus we propose a novel method based on CAmera view Position Embedding, called CAPE. We form the 3D position embeddings under the local camera-view coordinate system instead of the global coordinate system, such that 3D position embedding is free of encoding camera extrinsic parameters. Furthermore, we extend our CAPE to temporal modeling by exploiting the object queries of previous frames and encoding the ego-motion for boosting 3D object detection. CAPE achieves state-of-the-art performance (61.0% NDS and 52.5% mAP) among all LiDAR-free methods on nuScenes dataset. Codes and models are available on \href{https://github.com/PaddlePaddle/Paddle3D}{Paddle3D} and \href{https://github.com/kaixinbear/CAPE}{PyTorch Implementation}.

READ FULL TEXT
research
03/10/2022

PETR: Position Embedding Transformation for Multi-View 3D Object Detection

In this paper, we develop position embedding transformation (PETR) for m...
research
03/15/2023

RefiNeRF: Modelling dynamic neural radiance fields with inconsistent or missing camera parameters

Novel view synthesis (NVS) is a challenging task in computer vision that...
research
11/28/2022

Toward Global Sensing Quality Maximization: A Configuration Optimization Scheme for Camera Networks

The performance of a camera network monitoring a set of targets depends ...
research
06/22/2022

Polar Parametrization for Vision-based Surround-View 3D Detection

3D detection based on surround-view camera system is a critical techniqu...
research
06/02/2022

PETRv2: A Unified Framework for 3D Perception from Multi-Camera Images

In this paper, we propose PETRv2, a unified framework for 3D perception ...
research
07/06/2021

Rethinking Positional Encoding

It is well noted that coordinate based MLPs benefit greatly – in terms o...
research
07/08/2022

Multi-view Attention for gestational age at birth prediction

We present our method for gestational age at birth prediction for the SL...

Please sign up or login with your details

Forgot password? Click here to reset