Language Models for Image Captioning: The Quirks and What Works

05/07/2015
by   Jacob Devlin, et al.
0

Two recent approaches have achieved state-of-the-art results in image captioning. The first uses a pipelined process where a set of candidate words is generated by a convolutional neural network (CNN) trained on images, and then a maximum entropy (ME) language model is used to arrange these words into a coherent sentence. The second uses the penultimate activation layer of the CNN as input to a recurrent neural network (RNN) that then generates the caption sequence. In this paper, we compare the merits of these different language modeling approaches for the first time by using the same state-of-the-art CNN as input. We examine issues in the different approaches, including linguistic irregularities, caption repetition, and data set overlap. By combining key aspects of the ME and RNN methods, we achieve a new record performance over previously published results on the benchmark COCO dataset. However, the gains we see in BLEU do not translate to human judgments.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/21/2016

An Empirical Study of Language CNN for Image Captioning

Language Models based on recurrent neural networks have dominated recent...
research
11/09/2015

Visual Language Modeling on CNN Image Representations

Measuring the naturalness of images is important to generate realistic i...
research
07/26/2018

Recurrent Fusion Network for Image Captioning

Recently, much advance has been made in image captioning, and an encoder...
research
03/27/2023

Graph Sequence Learning for Premise Selection

Premise selection is crucial for large theory reasoning as the sheer siz...
research
03/18/2017

Recurrent Models for Situation Recognition

This work proposes Recurrent Neural Network (RNN) models to predict stru...
research
07/26/2021

Exploiting Language Model for Efficient Linguistic Steganalysis

Recent advances in linguistic steganalysis have successively applied CNN...
research
12/03/2016

Areas of Attention for Image Captioning

We propose "Areas of Attention", a novel attention-based model for autom...

Please sign up or login with your details

Forgot password? Click here to reset