Exploring Visual Relationship for Image Captioning

09/19/2018
by   Ting Yao, et al.
0

It is always well believed that modeling relationships between objects would be helpful for representing and eventually describing an image. Nevertheless, there has not been evidence in support of the idea on image description generation. In this paper, we introduce a new design to explore the connections between objects for image captioning under the umbrella of attention-based encoder-decoder framework. Specifically, we present Graph Convolutional Networks plus Long Short-Term Memory (dubbed as GCN-LSTM) architecture that novelly integrates both semantic and spatial object relationships into image encoder. Technically, we build graphs over the detected objects in an image based on their spatial and semantic connections. The representations of each region proposed on objects are then refined by leveraging graph structure through GCN. With the learnt region-level features, our GCN-LSTM capitalizes on LSTM-based captioning framework with attention mechanism for sentence generation. Extensive experiments are conducted on COCO image captioning dataset, and superior results are reported when comparing to state-of-the-art approaches. More remarkably, GCN-LSTM increases CIDEr-D performance from 120.1 to 128.7

READ FULL TEXT
research
09/09/2019

Hierarchy Parsing for Image Captioning

It is always well believed that parsing an image into constituent visual...
research
12/04/2019

Better Understanding Hierarchical Visual Relationship for Image Caption

The Convolutional Neural Network (CNN) has been the dominant image featu...
research
05/06/2021

Exploring Explicit and Implicit Visual Relationships for Image Captioning

Image captioning is one of the most challenging tasks in AI, which aims ...
research
08/01/2019

Convolutional Auto-encoding of Sentence Topics for Image Paragraph Generation

Image paragraph generation is the task of producing a coherent story (us...
research
08/06/2019

Aligning Linguistic Words and Visual Semantic Units for Image Captioning

Image captioning attempts to generate a sentence composed of several lin...
research
08/05/2021

Dual Graph Convolutional Networks with Transformer and Curriculum Learning for Image Captioning

Existing image captioning methods just focus on understanding the relati...
research
07/29/2021

ReFormer: The Relational Transformer for Image Captioning

Image captioning is shown to be able to achieve a better performance by ...

Please sign up or login with your details

Forgot password? Click here to reset