Semantic Segmentation Enhanced Transformer Model for Human Attention Prediction

01/26/2023
by   Shuo Zhang, et al.
0

Saliency Prediction aims to predict the attention distribution of human eyes given an RGB image. Most of the recent state-of-the-art methods are based on deep image feature representations from traditional CNNs. However, the traditional convolution could not capture the global features of the image well due to its small kernel size. Besides, the high-level factors which closely correlate to human visual perception, e.g., objects, color, light, etc., are not considered. Inspired by these, we propose a Transformer-based method with semantic segmentation as another learning objective. More global cues of the image could be captured by Transformer. In addition, simultaneously learning the object segmentation simulates the human visual perception, which we would verify in our investigation of human gaze control in cognitive science. We build an extra decoder for the subtask and the multiple tasks share the same Transformer encoder, forcing it to learn from multiple feature spaces. We find in practice simply adding the subtask might confuse the main task learning, hence Multi-task Attention Module is proposed to deal with the feature interaction between the multiple learning targets. Our method achieves competitive performance compared to other state-of-the-art methods.

READ FULL TEXT

page 10

page 11

page 24

page 38

page 39

research
03/21/2023

LiDARFormer: A Unified Transformer-based Multi-task Network for LiDAR Perception

There is a recent trend in the LiDAR perception field towards unifying m...
research
03/08/2022

BEVSegFormer: Bird's Eye View Semantic Segmentation From Arbitrary Camera Rigs

Semantic segmentation in bird's eye view (BEV) is an important task for ...
research
08/03/2022

SSformer: A Lightweight Transformer for Semantic Segmentation

It is well believed that Transformer performs better in semantic segment...
research
03/14/2022

TransCAM: Transformer Attention-based CAM Refinement for Weakly Supervised Semantic Segmentation

Weakly supervised semantic segmentation (WSSS) with only image-level sup...
research
04/25/2021

Visual Saliency Transformer

Recently, massive saliency detection methods have achieved promising res...
research
11/02/2022

Semantic SuperPoint: A Deep Semantic Descriptor

Several SLAM methods benefit from the use of semantic information. Most ...
research
07/05/2022

FishFormer: Annulus Slicing-based Transformer for Fisheye Rectification with Efficacy Domain Exploration

Numerous significant progress on fisheye image rectification has been ac...

Please sign up or login with your details

Forgot password? Click here to reset