Lift, Splat, Shoot: Encoding Images From Arbitrary Camera Rigs by Implicitly Unprojecting to 3D

08/13/2020
by   Jonah Philion, et al.
1

The goal of perception for autonomous vehicles is to extract semantic representations from multiple sensors and fuse these representations into a single "bird's-eye-view" coordinate frame for consumption by motion planning. We propose a new end-to-end architecture that directly extracts a bird's-eye-view representation of a scene given image data from an arbitrary number of cameras. The core idea behind our approach is to "lift" each image individually into a frustum of features for each camera, then "splat" all frustums into a rasterized bird's-eye-view grid. By training on the entire camera rig, we provide evidence that our model is able to learn not only how to represent images but how to fuse predictions from all cameras into a single cohesive representation of the scene while being robust to calibration error. On standard bird's-eye-view tasks such as object segmentation and map segmentation, our model outperforms all baselines and prior work. In pursuit of the goal of learning dense representations for motion planning, we show that the representations inferred by our model enable interpretable end-to-end motion planning by "shooting" template trajectories into a bird's-eye-view cost map output by our network. We benchmark our approach against models that use oracle depth from lidar. Project page with code: https://nv-tlabs.github.io/lift-splat-shoot .

READ FULL TEXT

page 1

page 13

page 14

research
08/13/2020

Perceive, Predict, and Plan: Safe Motion Planning Through Interpretable Semantic Representations

In this paper we propose a novel end-to-end learnable network that perfo...
research
06/08/2022

Learning Ego 3D Representation as Ray Tracing

A self-driving perception model aims to extract 3D semantic representati...
research
06/05/2023

Scene as Occupancy

Human driver can easily describe the complex traffic scene by visual sys...
research
06/08/2023

UAP-BEV: Uncertainty Aware Planning using Bird's Eye View generated from Surround Monocular Images

Autonomous driving requires accurate reasoning of the location of object...
research
03/15/2023

DiffBEV: Conditional Diffusion Model for Bird's Eye View Perception

BEV perception is of great importance in the field of autonomous driving...
research
06/26/2017

Learning to Map Vehicles into Bird's Eye View

Awareness of the road scene is an essential component for both autonomou...
research
03/30/2020

Predicting Semantic Map Representations from Images using Pyramid Occupancy Networks

Autonomous vehicles commonly rely on highly detailed birds-eye-view maps...

Please sign up or login with your details

Forgot password? Click here to reset