Max Pooling with Vision Transformers reconciles class and shape in weakly supervised semantic segmentation

10/31/2022
by   Simone Rossetti, et al.
0

Weakly Supervised Semantic Segmentation (WSSS) research has explored many directions to improve the typical pipeline CNN plus class activation maps (CAM) plus refinements, given the image-class label as the only supervision. Though the gap with the fully supervised methods is reduced, further abating the spread seems unlikely within this framework. On the other hand, WSSS methods based on Vision Transformers (ViT) have not yet explored valid alternatives to CAM. ViT features have been shown to retain a scene layout, and object boundaries in self-supervised learning. To confirm these findings, we prove that the advantages of transformers in self-supervised methods are further strengthened by Global Max Pooling (GMP), which can leverage patch features to negotiate pixel-label probability with class probability. This work proposes a new WSSS method dubbed ViT-PCM (ViT Patch-Class Mapping), not based on CAM. The end-to-end presented network learns with a single optimization process, refined shape and proper localization for segmentation masks. Our model outperforms the state-of-the-art on baseline pseudo-masks (BPM), where we achieve 69.3% mIoU on PascalVOC 2012 val set. We show that our approach has the least set of parameters, though obtaining higher accuracy than all other approaches. In a sentence, quantitative and qualitative results of our method reveal that ViT-PCM is an excellent alternative to CNN-CAM based architectures.

READ FULL TEXT

page 2

page 10

page 14

page 23

page 24

page 26

research
03/16/2022

WegFormer: Transformers for Weakly Supervised Semantic Segmentation

Although convolutional neural networks (CNNs) have achieved remarkable p...
research
08/08/2022

Exploiting Shape Cues for Weakly Supervised Semantic Segmentation

Weakly supervised semantic segmentation (WSSS) aims to produce pixel-wis...
research
11/04/2019

Self-Supervised Difference Detection for Weakly-Supervised Semantic Segmentation

To minimize the annotation costs associated with the training of semanti...
research
03/30/2023

Removing supervision in semantic segmentation with local-global matching and area balancing

Removing supervision in semantic segmentation is still tricky. Current a...
research
08/22/2023

Exploring Unsupervised Cell Recognition with Prior Self-activation Maps

The success of supervised deep learning models on cell recognition tasks...
research
03/19/2023

MECPformer: Multi-estimations Complementary Patch with CNN-Transformers for Weakly Supervised Semantic Segmentation

The initial seed based on the convolutional neural network (CNN) for wea...
research
03/14/2023

USAGE: A Unified Seed Area Generation Paradigm for Weakly Supervised Semantic Segmentation

Seed area generation is usually the starting point of weakly supervised ...

Please sign up or login with your details

Forgot password? Click here to reset