TS-CAM: Token Semantic Coupled Attention Map for Weakly Supervised Object Localization

03/27/2021
by   Wei Gao, et al.
0

Weakly supervised object localization (WSOL) is a challenging problem when given image category labels but requires to learn object localization models. Optimizing a convolutional neural network (CNN) for classification tends to activate local discriminative regions while ignoring complete object extent, causing the partial activation issue. In this paper, we argue that partial activation is caused by the intrinsic characteristics of CNN, where the convolution operations produce local receptive fields and experience difficulty to capture long-range feature dependency among pixels. We introduce the token semantic coupled attention map (TS-CAM) to take full advantage of the self-attention mechanism in visual transformer for long-range dependency extraction. TS-CAM first splits an image into a sequence of patch tokens for spatial embedding, which produce attention maps of long-range visual dependency to avoid partial activation. TS-CAM then re-allocates category-related semantics for patch tokens, enabling each of them to be aware of object categories. TS-CAM finally couples the patch tokens with the semantic-agnostic attention map to achieve semantic-aware localization. Experiments on the ILSVRC/CUB-200-2011 datasets show that TS-CAM outperforms its CNN-CAM counterparts by 7.1

READ FULL TEXT

page 1

page 5

page 6

page 8

page 12

page 14

page 15

page 16

research
08/06/2023

MCTformer+: Multi-Class Token Transformer for Weakly Supervised Semantic Segmentation

This paper proposes a novel transformer-based framework that aims to enh...
research
07/21/2022

Weakly Supervised Object Localization via Transformer with Implicit Spatial Calibration

Weakly Supervised Object Localization (WSOL), which aims to localize obj...
research
12/16/2022

DQnet: Cross-Model Detail Querying for Camouflaged Object Detection

Camouflaged objects are seamlessly blended in with their surroundings, w...
research
04/06/2023

Visual Dependency Transformers: Dependency Tree Emerges from Reversed Attention

Humans possess a versatile mechanism for extracting structured represent...
research
11/20/2022

Attention-based Class Activation Diffusion for Weakly-Supervised Semantic Segmentation

Extracting class activation maps (CAM) is a key step for weakly-supervis...
research
08/22/2023

Weakly Supervised Face and Whole Body Recognition in Turbulent Environments

Face and person recognition have recently achieved remarkable success un...
research
09/04/2023

Semantic-Constraint Matching Transformer for Weakly Supervised Object Localization

Weakly supervised object localization (WSOL) strives to learn to localiz...

Please sign up or login with your details

Forgot password? Click here to reset