Neural Image Captioning

07/02/2019
by   Elaina Tan, et al.
7

In recent years, the biggest advances in major Computer Vision tasks, such as object recognition, handwritten-digit identification, facial recognition, and many others., have all come through the use of Convolutional Neural Networks (CNNs). Similarly, in the domain of Natural Language Processing, Recurrent Neural Networks (RNNs), and Long Short Term Memory networks (LSTMs) in particular, have been crucial to some of the biggest breakthroughs in performance for tasks such as machine translation, part-of-speech tagging, sentiment analysis, and many others. These individual advances have greatly benefited tasks even at the intersection of NLP and Computer Vision, and inspired by this success, we studied some existing neural image captioning models that have proven to work well. In this work, we study some existing captioning models that provide near state-of-the-art performances, and try to enhance one such model. We also present a simple image captioning model that makes use of a CNN, an LSTM, and the beam search1 algorithm, and study its performance based on various qualitative and quantitative metrics.

READ FULL TEXT

page 3

page 4

page 5

page 7

page 8

page 9

research
11/05/2016

Boosting Image Captioning with Attributes

Automatically describing an image with a natural language has been an em...
research
11/24/2017

Convolutional Image Captioning

Image captioning is an important but challenging task, applicable to vir...
research
05/29/2015

A Critical Review of Recurrent Neural Networks for Sequence Learning

Countless learning tasks require dealing with sequential data. Image cap...
research
05/23/2018

CNN+CNN: Convolutional Decoders for Image Captioning

Image captioning is a challenging task that combines the field of comput...
research
10/15/2018

Bringing back simplicity and lightliness into neural image captioning

Neural Image Captioning (NIC) or neural caption generation has attracted...
research
04/05/2023

Towards Self-Explainability of Deep Neural Networks with Heatmap Captioning and Large-Language Models

Heatmaps are widely used to interpret deep neural networks, particularly...
research
07/28/2021

Experimenting with Self-Supervision using Rotation Prediction for Image Captioning

Image captioning is a task in the field of Artificial Intelligence that ...

Please sign up or login with your details

Forgot password? Click here to reset