Monocular Semantic Occupancy Grid Mapping with Convolutional Variational Auto-Encoders

04/06/2018

∙

In this work, we research and evaluate the usage of convolutional variational auto-encoders for end-to-end learning of semantic-metric occupancy grids from monocular data. The network learns to predict four different classes, as well as a camera to bird's eye view mapping, which is shown to be more robust than using a fixed-plane assumption. At the core, it utilizes a variational auto-encoder (VAE) that encodes the semantic-metric information of the driving scene and subsequently decodes it into a 2-D planar polar coordinate system. Even without using stereo or IMU data, this VAE approach is robust to pitch and roll perturbations of the camera view. The evaluations on Cityscapes show that our end-to-end learning of semantic-metric occupancy grids achieves 59.0 IoU, compared to 49.2 network achieves real-time inference rates of approx. 65 Hertz for an input with a resolution of 256*512 pixels.

READ FULL TEXT

Monocular Semantic Occupancy Grid Mapping with Convolutional Variational Auto-Encoders

Sign in with Google

Consider DeepAI Pro