ResCap V1: Deep Residual Learning Based Image Captioning

02/14/2020

∙

Image Captioning alludes to the process of generating text description from an image based on the image's objects and actions. It is a very strenuous task to create an image description automatically using any language phrase. It requires the expertise of both image processing as well as natural language processing. For describing image content, the bulk of state-of-the-art approaches follow an encoding-decoding framework, which generates captions using a sequential recurrent prediction model. The performance of this task has been significantly enhanced by ongoing advancement in deep neural networks. In this paper, we propose a deep residual learning based encoding-decoding framework for image captioning tasks. Our main objective is to replace the encoder part of the renowned Neural Image Caption Generator (NIC) model developed by Google with the residual network known as ResNet-50. In addition to that, we have discussed how our proposed model can be implemented. In the end, we have also evaluated the performance of our model using standard evaluation matrices.

READ FULL TEXT

ResCap V1: Deep Residual Learning Based Image Captioning

Sign in with Google

Consider DeepAI Pro