ResCap V1: Deep Residual Learning Based Image Captioning

02/14/2020 ∙ by Amir Ali, et al. ∙ 1

Image Captioning alludes to the process of generating text description from an image based on the image's objects and actions. It is a very strenuous task to create an image description automatically using any language phrase. It requires the expertise of both image processing as well as natural language processing. For describing image content, the bulk of state-of-the-art approaches follow an encoding-decoding framework, which generates captions using a sequential recurrent prediction model. The performance of this task has been significantly enhanced by ongoing advancement in deep neural networks. In this paper, we propose a deep residual learning based encoding-decoding framework for image captioning tasks. Our main objective is to replace the encoder part of the renowned Neural Image Caption Generator (NIC) model developed by Google with the residual network known as ResNet-50. In addition to that, we have discussed how our proposed model can be implemented. In the end, we have also evaluated the performance of our model using standard evaluation matrices.



There are no comments yet.


page 9

page 13

page 14

page 15

page 16

page 17

page 18

page 19

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.