Auto-Encoding Graphical Inductive Bias for Descriptive Image Captioning

12/06/2018
by   Xu Yang, et al.
16

We propose Scene Graph Auto-Encoder (SGAE) that incorporates the language inductive bias into the encoder-decoder image captioning framework for more human-like captions. Intuitively, we humans use the inductive bias to compose collocations and contextual inference in discourse. For example, when we see the relation `person on bike', it is natural to replace `on' with `ride' and infer `person riding bike on a road' even the `road' is not evident. Therefore, exploiting such bias as a language prior is expected to help the conventional encoder-decoder models less likely overfit to the dataset bias and focus on reasoning. Specifically, we use the scene graph --- a directed graph (G) where an object node is connected by adjective nodes and relationship nodes --- to represent the complex structural layout of both image (I) and sentence (S). In the textual domain, we use SGAE to learn a dictionary (D) that helps to reconstruct sentences in the S→G→D→S pipeline, where D encodes the desired language prior; in the vision-language domain, we use the shared D to guide the encoder-decoder in the I→G→D→S pipeline. Thanks to the scene graph representation and shared dictionary, the inductive bias is transferred across domains in principle. We validate the effectiveness of SGAE on the challenging MS-COCO image captioning benchmark, e.g., our SGAE-based single-model achieves a new state-of-the-art 127.8 CIDEr-D on the Karpathy split, and a competitive 125.5 CIDEr-D (c40) on the official server even compared to other ensemble models.

READ FULL TEXT

page 7

page 14

page 15

research
12/06/2018

Auto-Encoding Scene Graphs for Image Captioning

We propose Scene Graph Auto-Encoder (SGAE) that incorporates the languag...
research
03/09/2020

Deconfounded Image Captioning: A Causal Retrospect

The dataset bias in vision-language tasks is becoming one of the main pr...
research
04/18/2019

Learning to Collocate Neural Modules for Image Captioning

We do not speak word by word from scratch; our brain quickly structures ...
research
05/03/2023

Transforming Visual Scene Graphs to Image Captions

We propose to Transform Scene Graphs (TSG) into more descriptive caption...
research
06/29/2021

Contrastive Semantic Similarity Learning for Image Captioning Evaluation with Intrinsic Auto-encoder

Automatically evaluating the quality of image captions can be very chall...
research
10/04/2022

Learning to Collocate Visual-Linguistic Neural Modules for Image Captioning

Humans tend to decompose a sentence into different parts like sth do sth...
research
03/21/2022

Self-Supervised Road Layout Parsing with Graph Auto-Encoding

Aiming for higher-level scene understanding, this work presents a neural...

Please sign up or login with your details

Forgot password? Click here to reset