Transforming Visual Scene Graphs to Image Captions

05/03/2023
by   Xu Yang, et al.
0

We propose to Transform Scene Graphs (TSG) into more descriptive captions. In TSG, we apply multi-head attention (MHA) to design the Graph Neural Network (GNN) for embedding scene graphs. After embedding, different graph embeddings contain diverse specific knowledge for generating the words with different part-of-speech, e.g., object/attribute embedding is good for generating nouns/adjectives. Motivated by this, we design a Mixture-of-Expert (MOE)-based decoder, where each expert is built on MHA, for discriminating the graph embeddings to generate different kinds of words. Since both the encoder and decoder are built based on the MHA, as a result, we construct a homogeneous encoder-decoder unlike the previous heterogeneous ones which usually apply Fully-Connected-based GNN and LSTM-based decoder. The homogeneous architecture enables us to unify the training configuration of the whole model instead of specifying different training strategies for diverse sub-networks as in the heterogeneous pipeline, which releases the training difficulty. Extensive experiments on the MS-COCO captioning benchmark validate the effectiveness of our TSG. The code is in: https://anonymous.4open.science/r/ACL23_TSG.

READ FULL TEXT
research
12/06/2018

Auto-Encoding Scene Graphs for Image Captioning

We propose Scene Graph Auto-Encoder (SGAE) that incorporates the languag...
research
05/01/2020

Alleviating the Inconsistency Problem of Applying Graph Neural Network to Fraud Detection

The graph-based model can help to detect suspicious fraud online. Owing ...
research
12/06/2018

Auto-Encoding Graphical Inductive Bias for Descriptive Image Captioning

We propose Scene Graph Auto-Encoder (SGAE) that incorporates the languag...
research
10/08/2021

KG-FiD: Infusing Knowledge Graph in Fusion-in-Decoder for Open-Domain Question Answering

Current Open-Domain Question Answering (ODQA) model paradigm often conta...
research
04/18/2019

Learning to Collocate Neural Modules for Image Captioning

We do not speak word by word from scratch; our brain quickly structures ...
research
01/28/2022

Graph autoencoder with constant dimensional latent space

Invertible transformation of large graphs into constant dimensional vect...
research
02/14/2020

Graph Deconvolutional Generation

Graph generation is an extremely important task, as graphs are found thr...

Please sign up or login with your details

Forgot password? Click here to reset