ResCap V1: Deep Residual Learning Based Image Captioning

by   Amir Ali, et al.

Image Captioning alludes to the process of generating text description from an image based on the image's objects and actions. It is a very strenuous task to create an image description automatically using any language phrase. It requires the expertise of both image processing as well as natural language processing. For describing image content, the bulk of state-of-the-art approaches follow an encoding-decoding framework, which generates captions using a sequential recurrent prediction model. The performance of this task has been significantly enhanced by ongoing advancement in deep neural networks. In this paper, we propose a deep residual learning based encoding-decoding framework for image captioning tasks. Our main objective is to replace the encoder part of the renowned Neural Image Caption Generator (NIC) model developed by Google with the residual network known as ResNet-50. In addition to that, we have discussed how our proposed model can be implemented. In the end, we have also evaluated the performance of our model using standard evaluation matrices.


page 9

page 13

page 14

page 15

page 16

page 17

page 18

page 19


Image Captioning using Deep Neural Architectures

Automatically creating the description of an image using any natural lan...

Image Captioning based on Deep Reinforcement Learning

Recently it has shown that the policy-gradient methods for reinforcement...

Deep Reinforcement Learning-based Image Captioning with Embedding Reward

Image captioning is a challenging problem owing to the complexity in und...

Neural Twins Talk Alternative Calculations

Inspired by how the human brain employs a higher number of neural pathwa...

Efficient CNN-LSTM based Image Captioning using Neural Network Compression

Modern Neural Networks are eminent in achieving state of the art perform...

From Show to Tell: A Survey on Image Captioning

Connecting Vision and Language plays an essential role in Generative Int...

Image Captioning with Semantic Attention

Automatically generating a natural language description of an image has ...