Semantically Enhanced Global Reasoning for Semantic Segmentation

12/06/2022
by   Mir Rayat Imtiaz Hossain, et al.
0

Recent advances in pixel-level tasks (e.g., segmentation) illustrate the benefit of long-range interactions between aggregated region-based representations that can enhance local features. However, such pixel-to-region associations and the resulting representation, which often take the form of attention, cannot model the underlying semantic structure of the scene (e.g., individual objects and, by extension, their interactions). In this work, we take a step toward addressing this limitation. Specifically, we propose an architecture where we learn to project image features into latent region representations and perform global reasoning across them, using a transformer, to produce contextualized and scene-consistent representations that are then fused with original pixel-level features. Our design enables the latent regions to represent semantically meaningful concepts, by ensuring that activated regions are spatially disjoint and unions of such regions correspond to connected object segments. The resulting semantic global reasoning (SGR) is end-to-end trainable and can be combined with any semantic segmentation framework and backbone. Combining SGR with DeepLabV3 results in a semantic segmentation performance that is competitive to the state-of-the-art, while resulting in more semantically interpretable and diverse region representations, which we show can effectively transfer to detection and instance segmentation. Further, we propose a new metric that allows us to measure the semantics of representations at both the object class and instance level.

READ FULL TEXT

page 1

page 8

page 9

page 10

page 11

page 12

research
09/08/2016

Bottom-up Instance Segmentation using Deep Higher-Order CRFs

Traditional Scene Understanding problems such as Object Detection and Se...
research
07/01/2015

Polarimetric Hierarchical Semantic Model and Scattering Mechanism Based PolSAR Image Classification

For polarimetric SAR (PolSAR) image classification, it is a challenge to...
research
06/16/2018

Object Level Visual Reasoning in Videos

Human activity recognition is typically addressed by training models to ...
research
08/25/2022

Refine and Represent: Region-to-Object Representation Learning

Recent works in self-supervised learning have demonstrated strong perfor...
research
04/17/2020

MOPT: Multi-Object Panoptic Tracking

Comprehensive understanding of dynamic scenes is a critical prerequisite...
research
06/16/2020

Exploiting Visual Semantic Reasoning for Video-Text Retrieval

Video retrieval is a challenging research topic bridging the vision and ...
research
11/17/2019

Enhancing Generic Segmentation with Learned Region Representations

Current successful approaches for generic (non-semantic) segmentation re...

Please sign up or login with your details

Forgot password? Click here to reset