Semantic-Constraint Matching Transformer for Weakly Supervised Object Localization

09/04/2023
by   Yiwen Cao, et al.
0

Weakly supervised object localization (WSOL) strives to learn to localize objects with only image-level supervision. Due to the local receptive fields generated by convolution operations, previous CNN-based methods suffer from partial activation issues, concentrating on the object's discriminative part instead of the entire entity scope. Benefiting from the capability of the self-attention mechanism to acquire long-range feature dependencies, Vision Transformer has been recently applied to alleviate the local activation drawbacks. However, since the transformer lacks the inductive localization bias that are inherent in CNNs, it may cause a divergent activation problem resulting in an uncertain distinction between foreground and background. In this work, we proposed a novel Semantic-Constraint Matching Network (SCMN) via a transformer to converge on the divergent activation. Specifically, we first propose a local patch shuffle strategy to construct the image pairs, disrupting local patches while guaranteeing global consistency. The paired images that contain the common object in spatial are then fed into the Siamese network encoder. We further design a semantic-constraint matching module, which aims to mine the co-object part by matching the coarse class activation maps (CAMs) extracted from the pair images, thus implicitly guiding and calibrating the transformer network to alleviate the divergent activation. Extensive experimental results conducted on two challenging benchmarks, including CUB-200-2011 and ILSVRC datasets show that our method can achieve the new state-of-the-art performance and outperform the previous method by a large margin.

READ FULL TEXT
research
09/30/2022

Dual Progressive Transformations for Weakly Supervised Semantic Segmentation

Weakly supervised semantic segmentation (WSSS), which aims to mine the o...
research
12/10/2021

LCTR: On Awakening the Local Continuity of Transformer for Weakly Supervised Object Localization

Weakly supervised object localization (WSOL) aims to learn object locali...
research
07/21/2022

Weakly Supervised Object Localization via Transformer with Implicit Spatial Calibration

Weakly Supervised Object Localization (WSOL), which aims to localize obj...
research
03/27/2021

TS-CAM: Token Semantic Coupled Attention Map for Weakly Supervised Object Localization

Weakly supervised object localization (WSOL) is a challenging problem wh...
research
03/14/2022

TransCAM: Transformer Attention-based CAM Refinement for Weakly Supervised Semantic Segmentation

Weakly supervised semantic segmentation (WSSS) with only image-level sup...
research
08/08/2023

All-pairs Consistency Learning for Weakly Supervised Semantic Segmentation

In this work, we propose a new transformer-based regularization to bette...
research
02/18/2023

Hyneter: Hybrid Network Transformer for Object Detection

In this paper, we point out that the essential differences between CNN-b...

Please sign up or login with your details

Forgot password? Click here to reset