Neural Attention for Image Captioning: Review of Outstanding Methods

11/29/2021
by   Zanyar Zohourianshahzadi, et al.
0

Image captioning is the task of automatically generating sentences that describe an input image in the best way possible. The most successful techniques for automatically generating image captions have recently used attentive deep learning models. There are variations in the way deep learning models with attention are designed. In this survey, we provide a review of literature related to attentive deep learning models for image captioning. Instead of offering a comprehensive review of all prior work on deep image captioning models, we explain various types of attention mechanisms used for the task of image captioning in deep learning models. The most successful deep learning models used for image captioning follow the encoder-decoder architecture, although there are differences in the way these models employ attention mechanisms. Via analysis on performance results from different attentive deep models for image captioning, we aim at finding the most successful types of attention mechanisms in deep models for image captioning. Soft attention, bottom-up attention, and multi-head attention are the types of attention mechanism widely used in state-of-the-art attentive deep learning models for image captioning. At the current time, the best results are achieved from variants of multi-head attention with bottom-up attention.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/31/2022

Deep Learning Approaches on Image Captioning: A Review

Automatic image captioning, which involves describing the contents of an...
research
11/17/2016

SCA-CNN: Spatial and Channel-wise Attention in Convolutional Networks for Image Captioning

Visual attention has been successfully applied in structural prediction ...
research
01/04/2020

Understanding Image Captioning Models beyond Visualizing Attention

This paper explains predictions of image captioning models with attentio...
research
12/30/2022

On the Interpretability of Attention Networks

Attention mechanisms form a core component of several successful deep le...
research
07/15/2022

LineCap: Line Charts for Data Visualization Captioning Models

Data visualization captions help readers understand the purpose of a vis...
research
10/23/2018

Area Attention

Existing attention mechanisms, are mostly item-based in that a model is ...
research
05/20/2019

Image Captioning based on Deep Learning Methods: A Survey

Image captioning is a challenging task and attracting more and more atte...

Please sign up or login with your details

Forgot password? Click here to reset