Learning to Caption Images with Two-Stream Attention and Sentence Auto-Encoder

11/22/2019
by   Arushi Goel, et al.
0

Automatically generating natural language descriptions from an image is a challenging problem in artificial intelligence that requires a good understanding of the correlations between visual and textual cues. To bridge these two modalities, state-of-the-art methods commonly use a dynamic interface between image and text, called attention, that learns to identify related image parts to estimate the next word conditioned on the previous steps. While this mechanism is effective, it fails to find the right associations between visual and textual cues when they are noisy. In this paper we propose two novel approaches to address this issue - (i) a two-stream attention mechanism that can automatically discover latent categories and relate them to image regions based on the previously generated words, (ii) a regularization technique that encapsulates the syntactic and semantic structure of captions and improves the optimization of the image captioning model. Our qualitative and quantitative results demonstrate remarkable improvements on the MSCOCO dataset setting and lead to new state-of-the-art performances for image captioning.

READ FULL TEXT

page 2

page 7

research
06/15/2016

Watch What You Just Said: Image Captioning with Text-Conditional Attention

Attention mechanisms have attracted considerable interest in image capti...
research
06/16/2022

Image Captioning based on Feature Refinement and Reflective Decoding

Automatically generating a description of an image in natural language i...
research
12/03/2016

Areas of Attention for Image Captioning

We propose "Areas of Attention", a novel attention-based model for autom...
research
08/27/2018

A neural attention model for speech command recognition

This paper introduces a convolutional recurrent network with attention f...
research
10/26/2020

Reading Between the Lines: Exploring Infilling in Visual Narratives

Generating long form narratives such as stories and procedures from mult...
research
11/02/2020

Boost Image Captioning with Knowledge Reasoning

Automatically generating a human-like description for a given image is a...
research
06/26/2019

A Deep Decoder Structure Based on WordEmbedding Regression for An Encoder-Decoder Based Model for Image Captioning

Generating textual descriptions for images has been an attractive proble...

Please sign up or login with your details

Forgot password? Click here to reset