Monocular Semantic Occupancy Grid Mapping with Convolutional Variational Auto-Encoders

04/06/2018
by   Chenyang Lu, et al.
0

In this work, we research and evaluate the usage of convolutional variational auto-encoders for end-to-end learning of semantic-metric occupancy grids from monocular data. The network learns to predict four different classes, as well as a camera to bird's eye view mapping, which is shown to be more robust than using a fixed-plane assumption. At the core, it utilizes a variational auto-encoder (VAE) that encodes the semantic-metric information of the driving scene and subsequently decodes it into a 2-D planar polar coordinate system. Even without using stereo or IMU data, this VAE approach is robust to pitch and roll perturbations of the camera view. The evaluations on Cityscapes show that our end-to-end learning of semantic-metric occupancy grids achieves 59.0 IoU, compared to 49.2 network achieves real-time inference rates of approx. 65 Hertz for an input with a resolution of 256*512 pixels.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset