Fixating on Attention: Integrating Human Eye Tracking into Vision Transformers

08/26/2023
by   Sharath Koorathota, et al.
0

Modern transformer-based models designed for computer vision have outperformed humans across a spectrum of visual tasks. However, critical tasks, such as medical image interpretation or autonomous driving, still require reliance on human judgments. This work demonstrates how human visual input, specifically fixations collected from an eye-tracking device, can be integrated into transformer models to improve accuracy across multiple driving situations and datasets. First, we establish the significance of fixation regions in left-right driving decisions, as observed in both human subjects and a Vision Transformer (ViT). By comparing the similarity between human fixation maps and ViT attention weights, we reveal the dynamics of overlap across individual heads and layers. This overlap is exploited for model pruning without compromising accuracy. Thereafter, we incorporate information from the driving scene with fixation data, employing a "joint space-fixation" (JSF) attention setup. Lastly, we propose a "fixation-attention intersection" (FAX) loss to train the ViT model to attend to the same regions that humans fixated on. We find that the ViT performance is improved in accuracy and number of training epochs when using JSF and FAX. These results hold significant implications for human-guided artificial intelligence.

READ FULL TEXT

page 2

page 11

page 15

research
05/29/2021

FoveaTer: Foveated Transformer for Image Classification

Many animals and humans process the visual field with a varying spatial ...
research
03/24/2022

Transformers Meet Visual Learning Understanding: A Comprehensive Review

Dynamic attention mechanism and global modeling ability make Transformer...
research
08/11/2023

Evidence of Human-Like Visual-Linguistic Integration in Multimodal Large Language Models During Predictive Language Processing

The advanced language processing abilities of large language models (LLM...
research
07/22/2022

Applying Spatiotemporal Attention to Identify Distracted and Drowsy Driving with Vision Transformers

A 20 result of increased distraction and drowsiness. Drowsy and distract...
research
08/10/2021

Understanding Character Recognition using Visual Explanations Derived from the Human Visual System and Deep Networks

Human observers engage in selective information uptake when classifying ...
research
07/14/2021

Passive attention in artificial neural networks predicts human visual selectivity

Developments in machine learning interpretability techniques over the pa...
research
07/31/2013

A Prototyping Environment for Integrated Artificial Attention Systems

Artificial visual attention systems aim to support technical systems in ...

Please sign up or login with your details

Forgot password? Click here to reset