Gazeformer: Scalable, Effective and Fast Prediction of Goal-Directed Human Attention

03/27/2023
by   Sounak Mondal, et al.
0

Predicting human gaze is important in Human-Computer Interaction (HCI). However, to practically serve HCI applications, gaze prediction models must be scalable, fast, and accurate in their spatial and temporal gaze predictions. Recent scanpath prediction models focus on goal-directed attention (search). Such models are limited in their application due to a common approach relying on trained target detectors for all possible objects, and the availability of human gaze data for their training (both not scalable). In response, we pose a new task called ZeroGaze, a new variant of zero-shot learning where gaze is predicted for never-before-searched objects, and we develop a novel model, Gazeformer, to solve the ZeroGaze problem. In contrast to existing methods using object detector modules, Gazeformer encodes the target using a natural language model, thus leveraging semantic similarities in scanpath prediction. We use a transformer-based encoder-decoder architecture because transformers are particularly useful for generating contextual representations. Gazeformer surpasses other models by a large margin on the ZeroGaze setting. It also outperforms existing target-detection models on standard gaze prediction for both target-present and target-absent search tasks. In addition to its improved performance, Gazeformer is more than five times faster than the state-of-the-art target-present visual search model.

READ FULL TEXT

page 6

page 7

page 8

research
11/27/2016

Predicting the Category and Attributes of Visual Search Targets Using Deep Gaze Pooling

Predicting the target of visual search from eye fixation (gaze) data is ...
research
03/20/2022

End-to-End Human-Gaze-Target Detection with Transformers

In this paper, we propose an effective and efficient method for Human-Ga...
research
01/04/2018

Object Referring in Videos with Language and Human Gaze

We investigate the problem of object referring (OR) i.e. to localize a t...
research
03/16/2023

Predicting Human Attention using Computational Attention

Most models of visual attention are aimed at predicting either top-down ...
research
07/19/2017

Supervising Neural Attention Models for Video Captioning by Human Gaze Data

The attention mechanisms in deep neural networks are inspired by human's...
research
11/24/2022

Efficient Zero-shot Visual Search via Target and Context-aware Transformer

Visual search is a ubiquitous challenge in natural vision, including dai...
research
05/28/2020

Predicting Goal-directed Human Attention Using Inverse Reinforcement Learning

Being able to predict human gaze behavior has obvious importance for beh...

Please sign up or login with your details

Forgot password? Click here to reset