Eye-gaze-guided Vision Transformer for Rectifying Shortcut Learning

05/25/2022
by   Chong Ma, et al.
0

Learning harmful shortcuts such as spurious correlations and biases prevents deep neural networks from learning the meaningful and useful representations, thus jeopardizing the generalizability and interpretability of the learned representation. The situation becomes even more serious in medical imaging, where the clinical data (e.g., MR images with pathology) are limited and scarce while the reliability, generalizability and transparency of the learned model are highly required. To address this problem, we propose to infuse human experts' intelligence and domain knowledge into the training of deep neural networks. The core idea is that we infuse the visual attention information from expert radiologists to proactively guide the deep model to focus on regions with potential pathology and avoid being trapped in learning harmful shortcuts. To do so, we propose a novel eye-gaze-guided vision transformer (EG-ViT) for diagnosis with limited medical image data. We mask the input image patches that are out of the radiologists' interest and add an additional residual connection in the last encoder layer of EG-ViT to maintain the correlations of all patches. The experiments on two public datasets of INbreast and SIIM-ACR demonstrate our EG-ViT model can effectively learn/transfer experts' domain knowledge and achieve much better performance than baselines. Meanwhile, it successfully rectifies the harmful shortcut learning and significantly improves the EG-ViT model's interpretability. In general, EG-ViT takes the advantages of both human expert's prior knowledge and the power of deep neural networks. This work opens new avenues for advancing current artificial intelligence paradigms by infusing human intelligence.

READ FULL TEXT

page 3

page 7

page 9

research
06/17/2022

Rectify ViT Shortcut Learning by Visual Saliency

Shortcut learning is common but harmful to deep learning models, leading...
research
01/30/2021

Matching Representations of Explainable Artificial Intelligence and Eye Gaze for Human-Machine Interaction

Rapid non-verbal communication of task-based stimuli is a challenge in h...
research
11/01/2021

Transparency of Deep Neural Networks for Medical Image Analysis: A Review of Interpretability Methods

Artificial Intelligence has emerged as a useful aid in numerous clinical...
research
05/20/2022

Mask-guided Vision Transformer (MG-ViT) for Few-Shot Learning

Learning with little data is challenging but often inevitable in various...
research
03/27/2023

Core-Periphery Principle Guided Redesign of Self-Attention in Transformers

Designing more efficient, reliable, and explainable neural network archi...
research
04/25/2023

Eye tracking guided deep multiple instance learning with dual cross-attention for fundus disease detection

Deep neural networks (DNNs) have promoted the development of computer ai...
research
02/15/2022

Gaze-Guided Class Activation Mapping: Leveraging Human Attention for Network Attention in Chest X-rays Classification

The increased availability and accuracy of eye-gaze tracking technology ...

Please sign up or login with your details

Forgot password? Click here to reset