Comparative evaluation of CNN architectures for Image Caption Generation

02/23/2021
by   Sulabh Katiyar, et al.
6

Aided by recent advances in Deep Learning, Image Caption Generation has seen tremendous progress over the last few years. Most methods use transfer learning to extract visual information, in the form of image features, with the help of pre-trained Convolutional Neural Network models followed by transformation of the visual information using a Caption Generator module to generate the output sentences. Different methods have used different Convolutional Neural Network Architectures and, to the best of our knowledge, there is no systematic study which compares the relative efficacy of different Convolutional Neural Network architectures for extracting the visual information. In this work, we have evaluated 17 different Convolutional Neural Networks on two popular Image Caption Generation frameworks: the first based on Neural Image Caption (NIC) generation model and the second based on Soft-Attention framework. We observe that model complexity of Convolutional Neural Network, as measured by number of parameters, and the accuracy of the model on Object Recognition task does not necessarily co-relate with its efficacy on feature extraction for Image Caption Generation task.

READ FULL TEXT

page 1

page 7

page 8

research
03/06/2018

A Non-Technical Survey on Deep Convolutional Neural Network Architectures

Artificial neural networks have recently shown great results in many dis...
research
08/12/2017

Flower Categorization using Deep Convolutional Neural Networks

We have developed a deep learning network for classification of differen...
research
05/02/2015

Object-Scene Convolutional Neural Networks for Event Recognition in Images

Event recognition from still images is of great importance for image und...
research
07/03/2023

Predicting beauty, liking, and aesthetic quality: A comparative analysis of image databases for visual aesthetics research

In the fields of Experimental and Computational Aesthetics, numerous ima...
research
12/04/2020

Is It a Plausible Colour? UCapsNet for Image Colourisation

Human beings can imagine the colours of a grayscale image with no partic...
research
06/09/2017

MirBot, a collaborative object recognition system for smartphones using convolutional neural networks

MirBot is a collaborative application for smartphones that allows users ...
research
04/05/2020

Comparative Analysis of Multiple Deep CNN Models for Waste Classification

Waste is a wealth in a wrong place. Our research focuses on analyzing po...

Please sign up or login with your details

Forgot password? Click here to reset