Soft Correspondences in Multimodal Scene Parsing

09/28/2017
by   Sarah Taghavi Namin, et al.
0

Exploiting multiple modalities for semantic scene parsing has been shown to improve accuracy over the singlemodality scenario. However multimodal datasets often suffer from problems such as data misalignment and label inconsistencies, where the existing methods assume that corresponding regions in two modalities must have identical labels. We propose to address this issue, by formulating multimodal semantic labeling as inference in a CRF and introducing latent nodes to explicitly model inconsistencies between two modalities. These latent nodes allow us not only to leverage information from both domains to improve their labeling, but also to cut the edges between inconsistent regions. We propose to learn intradomain and inter-domain potential functions from training data to avoid hand-tuning of the model parameters. We evaluate our approach on two publicly available datasets containing 2D and 3D data. Thanks to our latent nodes and our learning strategy, our method outperforms the state-of-the-art in both cases. Moreover, in order to highlight the benefits of the geometric information and the potential of our method in simultaneous 2D/3D semantic and geometric inference, we performed simultaneous inference of semantic and geometric classes both in 2D and 3D that led to satisfactory improvements of the labeling results in both datasets.

READ FULL TEXT

page 7

page 8

page 9

page 10

page 14

page 15

page 16

research
04/24/2015

Semantic Motion Segmentation Using Dense CRF Formulation

While the literature has been fairly dense in the areas of scene underst...
research
08/13/2018

3D Geometry-Aware Semantic Labeling of Outdoor Street Scenes

This paper is concerned with the problem of how to better exploit 3D geo...
research
04/07/2016

Geometric Scene Parsing with Hierarchical LSTM

This paper addresses the problem of geometric scene parsing, i.e. simult...
research
11/07/2022

Adaptive Contrastive Learning on Multimodal Transformer for Review Helpfulness Predictions

Modern Review Helpfulness Prediction systems are dependent upon multiple...
research
02/06/2017

Neural Semantic Parsing over Multiple Knowledge-bases

A fundamental challenge in developing semantic parsers is the paucity of...
research
08/17/2016

Scene Labeling Through Knowledge-Based Rules Employing Constrained Integer Linear Programing

Scene labeling task is to segment the image into meaningful regions and ...
research
10/29/2021

Unsupervised Person Re-Identification with Wireless Positioning under Weak Scene Labeling

Existing unsupervised person re-identification methods only rely on visu...

Please sign up or login with your details

Forgot password? Click here to reset