CASTing Your Model: Learning to Localize Improves Self-Supervised Representations

Recent advances in self-supervised learning (SSL) have largely closed the gap with supervised ImageNet pretraining. Despite their success these methods have been primarily applied to unlabeled ImageNet images, and show marginal gains when trained on larger sets of uncurated images. We hypothesize that current SSL methods perform best on iconic images, and struggle on complex scene images with many objects. Analyzing contrastive SSL methods shows that they have poor visual grounding and receive poor supervisory signal when trained on scene images. We propose Contrastive Attention-Supervised Tuning(CAST) to overcome these limitations. CAST uses unsupervised saliency maps to intelligently sample crops, and to provide grounding supervision via a Grad-CAM attention loss. Experiments on COCO show that CAST significantly improves the features learned by SSL methods on scene images, and further experiments show that CAST-trained models are more robust to changes in backgrounds.

READ FULL TEXT

page 4

page 6

page 7

page 8

page 11

page 12

page 13

page 14

research
02/22/2023

Saliency Guided Contrastive Learning on Scene Images

Self-supervised learning holds promise in leveraging large numbers of un...
research
06/26/2023

Learning with Difference Attention for Visually Grounded Self-supervised Representations

Recent works in self-supervised learning have shown impressive results o...
research
12/10/2021

Learning Representations with Contrastive Self-Supervised Learning for Histopathology Applications

Unsupervised learning has made substantial progress over the last few ye...
research
04/20/2023

Movie Box Office Prediction With Self-Supervised and Visually Grounded Pretraining

Investments in movie production are associated with a high level of risk...
research
06/09/2022

Rethinking 360° Image Visual Attention Modelling with Unsupervised Learning

Despite the success of self-supervised representation learning on plana...
research
09/25/2020

G-SimCLR : Self-Supervised Contrastive Learning with Guided Projection via Pseudo Labelling

In the realms of computer vision, it is evident that deep neural network...
research
02/08/2022

Self-supervised Contrastive Learning for Volcanic Unrest Detection

Ground deformation measured from Interferometric Synthetic Aperture Rada...

Please sign up or login with your details

Forgot password? Click here to reset