Cross-view Semantic Segmentation for Sensing Surroundings

06/09/2019
by   Bowen Pan, et al.
1

Sensing surroundings is ubiquitous and effortless to humans: It takes a single glance to extract the spatial configuration of objects and the free space from the scene. To help machine vision with spatial understanding capabilities, we introduce the View Parsing Network (VPN) for cross-view semantic segmentation. In this framework, the first-view observations are parsed into a top-down-view semantic map indicating precise object locations. VPN contains a view transformer module, designed to aggregate the first-view observations taken from multiple angles and modalities, in order to draw a bird-view semantic map. We evaluate the VPN framework for cross-view segmentation on two types of environments, indoors and driving-traffic scenes. Experimental results show that our model accurately predicts the top-down-view semantic mask of the visible objects from the first-view observations, as well as infer the location of contextually-relevant objects even if they are invisible.

READ FULL TEXT

page 1

page 3

page 4

page 5

page 6

page 7

page 8

research
05/05/2022

Cross-view Transformers for real-time Map-view Semantic Segmentation

We present cross-view transformers, an efficient attention-based model f...
research
05/30/2022

MVMO: A Multi-Object Dataset for Wide Baseline Multi-View Semantic Segmentation

We present MVMO (Multi-View, Multi-Object dataset): a synthetic dataset ...
research
07/05/2022

CoBEVT: Cooperative Bird's Eye View Semantic Segmentation with Sparse Transformers

Bird's eye view (BEV) semantic segmentation plays a crucial role in spat...
research
09/16/2017

Scene-centric Joint Parsing of Cross-view Videos

Cross-view video understanding is an important yet under-explored area i...
research
06/19/2020

BEV-Seg: Bird's Eye View Semantic Segmentation Using Geometry and Semantic Point Cloud

Bird's-eye-view (BEV) is a powerful and widely adopted representation fo...
research
09/16/2019

Visuomotor Understanding for Representation Learning of Driving Scenes

Dashboard cameras capture a tremendous amount of driving scene video eac...
research
12/27/2022

Interactive Segmentation of Radiance Fields

Radiance Fields (RF) are popular to represent casually-captured scenes f...

Please sign up or login with your details

Forgot password? Click here to reset