Rethinking 360° Image Visual Attention Modelling with Unsupervised Learning

06/09/2022
by   krishnatarun7, et al.
0

Despite the success of self-supervised representation learning on planar data, to date it has not been studied on 360° images. In this paper, we extend recent advances in contrastive learning to learn latent representations that are sufficiently invariant to be highly effective for spherical saliency prediction as a downstream task. We argue that omni-directional images are particularly suited to such an approach due to the geometry of the data domain. To verify this hypothesis, we design an unsupervised framework that effectively maximizes the mutual information between the different views from both the equator and the poles. We show that the decoder is able to learn good quality saliency distributions from the encoder embeddings. Our model compares favorably with fully-supervised learning methods on the Salient360!, VR-EyeTracking and Sitzman datasets. This performance is achieved using an encoder that is trained in a completely unsupervised way and a relatively lightweight supervised decoder (3.8 × fewer parameters in the case of the ResNet50 encoder). We believe that this combination of supervised and unsupervised learning is an important step toward flexible formulations of human visual attention. The results can be reproduced on GitHub

READ FULL TEXT

page 1

page 8

research
11/11/2020

Unsupervised Learning of Dense Visual Representations

Contrastive self-supervised learning has emerged as a promising approach...
research
05/20/2020

What makes for good views for contrastive learning

Contrastive learning between multiple views of the data has recently ach...
research
02/22/2023

Saliency Guided Contrastive Learning on Scene Images

Self-supervised learning holds promise in leveraging large numbers of un...
research
12/08/2020

CASTing Your Model: Learning to Localize Improves Self-Supervised Representations

Recent advances in self-supervised learning (SSL) have largely closed th...
research
12/25/2020

Taxonomy of multimodal self-supervised representation learning

Sensory input from multiple sources is crucial for robust and coherent h...
research
05/26/2023

Unsupervised Embedding Quality Evaluation

Unsupervised learning has recently significantly gained in popularity, e...
research
03/20/2021

Unsupervised Feature Learning for Manipulation with Contrastive Domain Randomization

Robotic tasks such as manipulation with visual inputs require image feat...

Please sign up or login with your details

Forgot password? Click here to reset