Image Captioning with Semantic Attention

03/12/2016
by   Quanzeng You, et al.
0

Automatically generating a natural language description of an image has attracted interests recently both because of its importance in practical applications and because it connects two major artificial intelligence fields: computer vision and natural language processing. Existing approaches are either top-down, which start from a gist of an image and convert it into words, or bottom-up, which come up with words describing various aspects of an image and then combine them. In this paper, we propose a new algorithm that combines both approaches through a model of semantic attention. Our algorithm learns to selectively attend to semantic concept proposals and fuse them into hidden states and outputs of recurrent neural networks. The selection and fusion form a feedback connecting the top-down and bottom-up computation. We evaluate our algorithm on two public benchmarks: Microsoft COCO and Flickr30K. Experimental results show that our algorithm significantly outperforms the state-of-the-art approaches consistently across different evaluation metrics.

READ FULL TEXT
research
09/21/2016

Show and Tell: Lessons learned from the 2015 MSCOCO Image Captioning Challenge

Automatically describing the content of an image is a fundamental proble...
research
11/05/2016

Boosting Image Captioning with Attributes

Automatically describing an image with a natural language has been an em...
research
10/15/2018

Bringing back simplicity and lightliness into neural image captioning

Neural Image Captioning (NIC) or neural caption generation has attracted...
research
07/14/2021

From Show to Tell: A Survey on Image Captioning

Connecting Vision and Language plays an essential role in Generative Int...
research
05/29/2019

Vision-to-Language Tasks Based on Attributes and Attention Mechanism

Vision-to-language tasks aim to integrate computer vision and natural la...
research
11/17/2014

Show and Tell: A Neural Image Caption Generator

Automatically describing the content of an image is a fundamental proble...

Please sign up or login with your details

Forgot password? Click here to reset