A Thorough Review on Recent Deep Learning Methodologies for Image Captioning

07/28/2021
by   Ahmed Elhagry, et al.
0

Image Captioning is a task that combines computer vision and natural language processing, where it aims to generate descriptive legends for images. It is a two-fold process relying on accurate image understanding and correct language understanding both syntactically and semantically. It is becoming increasingly difficult to keep up with the latest research and findings in the field of image captioning due to the growing amount of knowledge available on the topic. There is not, however, enough coverage of those findings in the available review papers. We perform in this paper a run-through of the current techniques, datasets, benchmarks and evaluation metrics used in image captioning. The current research on the field is mostly focused on deep learning-based methods, where attention mechanisms along with deep reinforcement and adversarial learning appear to be in the forefront of this research topic. In this paper, we review recent methodologies such as UpDown, OSCAR, VIVO, Meta Learning and a model that uses conditional generative adversarial nets. Although the GAN-based model achieves the highest score, UpDown represents an important basis for image captioning and OSCAR and VIVO are more useful as they use novel object captioning. This review paper serves as a roadmap for researchers to keep up to date with the latest contributions made in the field of image caption generation.

READ FULL TEXT
research
10/06/2018

A Comprehensive Study of Deep Learning for Image Captioning

Generating a description of an image is called image captioning. Image c...
research
10/31/2019

Can adversarial training learn image captioning ?

Recently, generative adversarial networks (GAN) have gathered a lot of i...
research
01/14/2019

Image Based Review Text Generation with Emotional Guidance

In the current field of computer vision, automatically generating texts ...
research
01/09/2018

DeepSeek: Content Based Image Search & Retrieval

Most of the internet today is composed of digital media that includes vi...
research
07/14/2021

From Show to Tell: A Survey on Image Captioning

Connecting Vision and Language plays an essential role in Generative Int...
research
06/24/2020

Recurrent Relational Memory Network for Unsupervised Image Captioning

Unsupervised image captioning with no annotations is an emerging challen...
research
06/11/2019

Mimic and Fool: A Task Agnostic Adversarial Attack

At present, adversarial attacks are designed in a task-specific fashion....

Please sign up or login with your details

Forgot password? Click here to reset