Phrase-based Image Captioning

02/12/2015
by   Rémi Lebret, et al.
0

Generating a novel textual description of an image is an interesting problem that connects computer vision and natural language processing. In this paper, we present a simple model that is able to generate descriptive sentences given a sample image. This model has a strong focus on the syntax of the descriptions. We train a purely bilinear model that learns a metric between an image representation (generated from a previously trained Convolutional Neural Network) and phrases that are used to described them. The system is then able to infer phrases from a given image sample. Based on caption syntax statistics, we propose a simple language model that can produce relevant descriptions for a given test image using the phrases inferred. Our approach, which is considerably simpler than state-of-the-art models, achieves comparable results in two popular datasets for the task: Flickr30k and the recently proposed Microsoft COCO.

READ FULL TEXT

page 4

page 8

research
12/29/2014

Simple Image Description Generator via a Linear Phrase-Based Approach

Generating a novel textual description of an image is an interesting pro...
research
08/20/2016

phi-LSTM: A Phrase-based Hierarchical LSTM Model for Image Captioning

A picture is worth a thousand words. Not until recently, however, we not...
research
11/11/2017

Phrase-based Image Captioning with Hierarchical LSTM Model

Automatic generation of caption to describe the content of an image has ...
research
09/21/2016

Show and Tell: Lessons learned from the 2015 MSCOCO Image Captioning Challenge

Automatically describing the content of an image is a fundamental proble...
research
09/04/2023

DeViL: Decoding Vision features into Language

Post-hoc explanation methods have often been criticised for abstracting ...
research
08/07/2020

Textual Description for Mathematical Equations

Reading of mathematical expression or equation in the document images is...
research
02/25/2021

Hybrid deep neural network for Bangla automated image descriptor

Automated image to text generation is a computationally challenging comp...

Please sign up or login with your details

Forgot password? Click here to reset