A Hybrid Model for Combining Neural Image Caption and k-Nearest Neighbor Approach for Image Captioning

05/09/2021
by   Kartik Arora, et al.
0

A hybrid model is proposed that integrates two popular image captioning methods to generate a text-based summary describing the contents of the image. The two image captioning models are the Neural Image Caption (NIC) and the k-nearest neighbor approach. These are trained individually on the training set. We extract a set of five features, from the validation set, for evaluating the results of the two models that in turn is used to train a logistic regression classifier. The BLEU-4 scores of the two models are compared for generating the binary-value ground truth for the logistic regression classifier. For the test set, the input images are first passed separately through the two models to generate the individual captions. The five-dimensional feature set extracted from the two models is passed to the logistic regression classifier to take a decision regarding the final caption generated which is the best of two captions generated by the models. Our implementation of the k-nearest neighbor model achieves a BLEU-4 score of 15.95 and the NIC model achieves a BLEU-4 score of 16.01, on the benchmark Flickr8k dataset. The proposed hybrid model is able to achieve a BLEU-4 score of 18.20 proving the validity of our approach.

READ FULL TEXT
research
10/10/2022

CLIP-Diffusion-LM: Apply Diffusion Model on Image Captioning

Image captioning task has been extensively researched by previous work. ...
research
10/12/2016

Generating captions without looking beyond objects

This paper explores new evaluation perspectives for image captioning and...
research
09/05/2019

REO-Relevance, Extraness, Omission: A Fine-grained Evaluation for Image Captioning

Popular metrics used for evaluating image captioning systems, such as BL...
research
04/08/2022

On Distinctive Image Captioning via Comparing and Reweighting

Recent image captioning models are achieving impressive results based on...
research
09/18/2020

Image Captioning with Attention for Smart Local Tourism using EfficientNet

Smart systems have been massively developed to help humans in various ta...
research
09/30/2020

Teacher-Critical Training Strategies for Image Captioning

Existing image captioning models are usually trained by cross-entropy (X...
research
08/31/2019

Detecting floodwater on roadways from image data with handcrafted features and deep transfer learning

Detecting roadway segments inundated due to floodwater has important app...

Please sign up or login with your details

Forgot password? Click here to reset