simNet: Stepwise Image-Topic Merging Network for Generating Detailed and Comprehensive Image Captions

08/27/2018
by   Fenglin Liu, et al.
0

The encode-decoder framework has shown recent success in image captioning. Visual attention, which is good at detailedness, and semantic attention, which is good at comprehensiveness, have been separately proposed to ground the caption on the image. In this paper, we propose the Stepwise Image-Topic Merging Network (simNet) that makes use of the two kinds of attention at the same time. At each time step when generating the caption, the decoder adaptively merges the attentive information in the extracted topics and the image according to the generated context, so that the visual information and the semantic information can be effectively combined. The proposed approach is evaluated on two benchmark datasets and reaches the state-of-the-art performances.(The code is available at https://github.com/lancopku/simNet)

READ FULL TEXT

page 1

page 8

page 9

research
08/19/2019

Attention on Attention for Image Captioning

Attention mechanisms are widely used in current encoder/decoder framewor...
research
08/16/2018

Context-Aware Visual Policy Network for Sequence-Level Image Captioning

Many vision-language tasks can be reduced to the problem of sequence pre...
research
06/20/2021

Exploring Semantic Relationships for Unpaired Image Captioning

Recently, image captioning has aroused great interest in both academic a...
research
07/02/2023

TopicFM+: Boosting Accuracy and Efficiency of Topic-Assisted Feature Matching

This study tackles the challenge of image matching in difficult scenario...
research
07/22/2022

Efficient Modeling of Future Context for Image Captioning

Existing approaches to image captioning usually generate the sentence wo...
research
11/16/2016

A Semi-supervised Framework for Image Captioning

State-of-the-art approaches for image captioning require supervised trai...
research
12/09/2020

Simple is not Easy: A Simple Strong Baseline for TextVQA and TextCaps

Texts appearing in daily scenes that can be recognized by OCR (Optical C...

Please sign up or login with your details

Forgot password? Click here to reset