LCTR: On Awakening the Local Continuity of Transformer for Weakly Supervised Object Localization

12/10/2021
by   Zhiwei Chen, et al.
13

Weakly supervised object localization (WSOL) aims to learn object localizer solely by using image-level labels. The convolution neural network (CNN) based techniques often result in highlighting the most discriminative part of objects while ignoring the entire object extent. Recently, the transformer architecture has been deployed to WSOL to capture the long-range feature dependencies with self-attention mechanism and multilayer perceptron structure. Nevertheless, transformers lack the locality inductive bias inherent to CNNs and therefore may deteriorate local feature details in WSOL. In this paper, we propose a novel framework built upon the transformer, termed LCTR (Local Continuity TRansformer), which targets at enhancing the local perception capability of global features among long-range feature dependencies. To this end, we propose a relational patch-attention module (RPAM), which considers cross-patch information on a global basis. We further design a cue digging module (CDM), which utilizes local features to guide the learning trend of the model for highlighting the weak local responses. Finally, comprehensive experiments are carried out on two widely used datasets, ie, CUB-200-2011 and ILSVRC, to verify the effectiveness of our method.

READ FULL TEXT

page 1

page 3

page 4

page 6

page 7

research
09/04/2023

Semantic-Constraint Matching Transformer for Weakly Supervised Object Localization

Weakly supervised object localization (WSOL) strives to learn to localiz...
research
09/29/2021

Infrared Small-Dim Target Detection with Transformer under Complex Backgrounds

The infrared small-dim target detection is one of the key techniques in ...
research
07/05/2022

CNN-based Local Vision Transformer for COVID-19 Diagnosis

Deep learning technology can be used as an assistive technology to help ...
research
02/18/2023

Hyneter: Hybrid Network Transformer for Object Detection

In this paper, we point out that the essential differences between CNN-b...
research
12/16/2022

DQnet: Cross-Model Detail Querying for Camouflaged Object Detection

Camouflaged objects are seamlessly blended in with their surroundings, w...
research
07/12/2022

eX-ViT: A Novel eXplainable Vision Transformer for Weakly Supervised Semantic Segmentation

Recently vision transformer models have become prominent models for a ra...
research
04/06/2023

Visual Dependency Transformers: Dependency Tree Emerges from Reversed Attention

Humans possess a versatile mechanism for extracting structured represent...

Please sign up or login with your details

Forgot password? Click here to reset