STD2P: RGBD Semantic Segmentation Using Spatio-Temporal Data-Driven Pooling

04/08/2016
by   Yang He, et al.
1

We propose a novel superpixel-based multi-view convolutional neural network for semantic image segmentation. The proposed network produces a high quality segmentation of a single image by leveraging information from additional views of the same scene. Particularly in indoor videos such as captured by robotic platforms or handheld and bodyworn RGBD cameras, nearby video frames provide diverse viewpoints and additional context of objects and scenes. To leverage such information, we first compute region correspondences by optical flow and image boundary-based superpixels. Given these region correspondences, we propose a novel spatio-temporal pooling layer to aggregate information over space and time. We evaluate our approach on the NYU--Depth--V2 and the SUN3D datasets and compare it to various state-of-the-art single-view and multi-view approaches. Besides a general improvement over the state-of-the-art, we also show the benefits of making use of unlabeled frames during training for multi-view as well as single-view prediction.

READ FULL TEXT

page 4

page 8

page 10

page 12

research
05/30/2022

MVMO: A Multi-Object Dataset for Wide Baseline Multi-View Semantic Segmentation

We present MVMO (Multi-View, Multi-Object dataset): a synthetic dataset ...
research
04/11/2020

Improving Semantic Segmentation through Spatio-Temporal Consistency Learned from Videos

We leverage unsupervised learning of depth, egomotion, and camera intrin...
research
09/27/2018

Multi-View Frame Reconstruction with Conditional GAN

Multi-view frame reconstruction is an important problem particularly whe...
research
05/27/2020

4D Visualization of Dynamic Events from Unconstrained Multi-View Videos

We present a data-driven approach for 4D space-time visualization of dyn...
research
03/17/2022

Transframer: Arbitrary Frame Prediction with Generative Models

We present a general-purpose framework for image modelling and vision ta...
research
10/04/2022

COPILOT: Human Collision Prediction and Localization from Multi-view Egocentric Videos

To produce safe human motions, assistive wearable exoskeletons must be e...
research
10/14/2021

DeepMoCap: Deep Optical Motion Capture Using Multiple Depth Sensors and Retro-Reflectors

In this paper, a marker-based, single-person optical motion capture meth...

Please sign up or login with your details

Forgot password? Click here to reset