MonoLayout: Amodal scene layout from a single image

02/19/2020
by   Kaustubh Mani, et al.
10

In this paper, we address the novel, highly challenging problem of estimating the layout of a complex urban driving scenario. Given a single color image captured from a driving platform, we aim to predict the bird's-eye view layout of the road and other traffic participants. The estimated layout should reason beyond what is visible in the image, and compensate for the loss of 3D information due to projection. We dub this problem amodal scene layout estimation, which involves "hallucinating" scene layout for even parts of the world that are occluded in the image. To this end, we present MonoLayout, a deep neural network for real-time amodal scene layout estimation from a single image. We represent scene layout as a multi-channel semantic occupancy grid, and leverage adversarial feature learning to hallucinate plausible completions for occluded image parts. Due to the lack of fair baseline methods, we extend several state-of-the-art approaches for road-layout estimation and vehicle occupancy estimation in bird's-eye view to the amodal setup for rigorous evaluation. By leveraging temporal sensor fusion to generate training labels, we significantly outperform current art over a number of datasets. On the KITTI and Argoverse datasets, we outperform all baselines by a significant margin. We also make all our annotations, and code publicly available. A video abstract of this paper is available https://www.youtube.com/watch?v=HcroGyo6yRQ .

READ FULL TEXT

page 1

page 6

page 8

page 9

page 14

page 15

page 16

page 17

research
08/20/2021

AutoLay: Benchmarking amodal layout estimation for autonomous driving

Given an image or a video captured from a monocular camera, amodal layou...
research
03/28/2018

Learning to Look around Objects for Top-View Representations of Outdoor Scenes

Given a single RGB image of a complex outdoor road scene in the perspect...
research
03/16/2021

RackLay: Multi-Layer Layout Estimation for Warehouse Racks

Given a monocular colour image of a warehouse rack, we aim to predict th...
research
11/15/2022

Monocular BEV Perception of Road Scenes via Front-to-Top View Projection

HD map reconstruction is crucial for autonomous driving. LiDAR-based met...
research
12/14/2018

A Parametric Top-View Representation of Complex Road Scenes

In this paper, we address the problem of inferring the layout of complex...
research
09/19/2022

A Dual-Cycled Cross-View Transformer Network for Unified Road Layout Estimation and 3D Object Detection in the Bird's-Eye-View

The bird's-eye-view (BEV) representation allows robust learning of multi...
research
03/21/2022

Self-Supervised Road Layout Parsing with Graph Auto-Encoding

Aiming for higher-level scene understanding, this work presents a neural...

Please sign up or login with your details

Forgot password? Click here to reset