Seeing with Humans: Gaze-Assisted Neural Image Captioning

08/18/2016
by   Yusuke Sugano, et al.
0

Gaze reflects how humans process visual scenes and is therefore increasingly used in computer vision systems. Previous works demonstrated the potential of gaze for object-centric tasks, such as object localization and recognition, but it remains unclear if gaze can also be beneficial for scene-centric tasks, such as image captioning. We present a new perspective on gaze-assisted image captioning by studying the interplay between human gaze and the attention mechanism of deep neural networks. Using a public large-scale gaze dataset, we first assess the relationship between state-of-the-art object and scene recognition models, bottom-up visual saliency, and human gaze. We then propose a novel split attention model for image captioning. Our model integrates human gaze information into an attention-based long short-term memory architecture, and allows the algorithm to allocate attention selectively to both fixated and non-fixated image regions. Through evaluation on the COCO/SALICON datasets we show that our method improves image captioning performance and that gaze can complement machine attention for semantic scene understanding tasks.

READ FULL TEXT

page 4

page 6

research
06/15/2016

Watch What You Just Said: Image Captioning with Text-Conditional Attention

Attention mechanisms have attracted considerable interest in image capti...
research
07/19/2017

Supervising Neural Attention Models for Video Captioning by Human Gaze Data

The attention mechanisms in deep neural networks are inspired by human's...
research
04/10/2018

Discovery and usage of joint attention in images

Joint visual attention is characterized by two or more individuals looki...
research
11/08/2020

Integrating Human Gaze into Attention for Egocentric Activity Recognition

It is well known that human gaze carries significant information about v...
research
11/09/2020

Generating Image Descriptions via Sequential Cross-Modal Alignment Guided by Human Gaze

When speakers describe an image, they tend to look at objects before men...
research
08/24/2022

Active Gaze Control for Foveal Scene Exploration

Active perception and foveal vision are the foundations of the human vis...
research
07/30/2013

An Integrated System for 3D Gaze Recovery and Semantic Analysis of Human Attention

This work describes a computer vision system that enables pervasive mapp...

Please sign up or login with your details

Forgot password? Click here to reset