AIC-AB NET: A Neural Network for Image Captioning with Spatial Attention and Text Attributes

07/14/2023
by   Guoyun Tu, et al.
0

Image captioning is a significant field across computer vision and natural language processing. We propose and present AIC-AB NET, a novel Attribute-Information-Combined Attention-Based Network that combines spatial attention architecture and text attributes in an encoder-decoder. For caption generation, adaptive spatial attention determines which image region best represents the image and whether to attend to the visual features or the visual sentinel. Text attribute information is synchronously fed into the decoder to help image recognition and reduce uncertainty. We have tested and evaluated our AICAB NET on the MS COCO dataset and a new proposed Fashion dataset. The Fashion dataset is employed as a benchmark of single-object images. The results show the superior performance of the proposed model compared to the state-of-the-art baseline and ablated models on both the images from MSCOCO and our single-object images. Our AIC-AB NET outperforms the baseline adaptive attention network by 0.017 (CIDEr score) on the MS COCO dataset and 0.095 (CIDEr score) on the Fashion dataset.

READ FULL TEXT
research
12/06/2016

Knowing When to Look: Adaptive Attention via A Visual Sentinel for Image Captioning

Attention-based neural encoder-decoder frameworks have been widely adopt...
research
10/20/2022

Visual Spatial Description: Controlled Spatial-Oriented Image-to-Text Generation

Image-to-text tasks, such as open-ended image captioning and controllabl...
research
02/28/2020

Exploring and Distilling Cross-Modal Information for Image Captioning

Recently, attention-based encoder-decoder models have been used extensiv...
research
12/15/2016

Recurrent Image Captioner: Describing Images with Spatial-Invariant Transformation and Attention Filtering

Along with the prosperity of recurrent neural network in modelling seque...
research
05/29/2019

Vision-to-Language Tasks Based on Attributes and Attention Mechanism

Vision-to-language tasks aim to integrate computer vision and natural la...
research
05/24/2017

Attention-based Natural Language Person Retrieval

Following the recent progress in image classification and captioning usi...
research
06/15/2019

Generating Diverse and Informative Natural Language Fashion Feedback

Recent advances in multi-modal vision and language tasks enable a new se...

Please sign up or login with your details

Forgot password? Click here to reset