Relationship-based Neural Baby Talk

03/08/2021
by   Fan Fu, et al.
0

Understanding interactions between objects in an image is an important element for generating captions. In this paper, we propose a relationship-based neural baby talk (R-NBT) model to comprehensively investigate several types of pairwise object interactions by encoding each image via three different relationship-based graph attention networks (GATs). We study three main relationships: spatial relationships to explore geometric interactions, semantic relationships to extract semantic interactions, and implicit relationships to capture hidden information that could not be modelled explicitly as above. We construct three relationship graphs with the objects in an image as nodes, and the mutual relationships of pairwise objects as edges. By exploring features of neighbouring regions individually via GATs, we integrate different types of relationships into visual features of each node. Experiments on COCO dataset show that our proposed R-NBT model outperforms state-of-the-art models trained on COCO dataset in three image caption generation tasks.

READ FULL TEXT

page 4

page 7

research
02/15/2022

Hyper-relationship Learning Network for Scene Graph Generation

Generating informative scene graphs from images requires integrating and...
research
03/16/2020

AVR: Attention based Salient Visual Relationship Detection

Visual relationship detection aims to locate objects in images and recog...
research
07/05/2018

Detecting Visual Relationships Using Box Attention

In this paper we propose a new model for detecting visual relationships....
research
10/05/2020

Attention Guided Semantic Relationship Parsing for Visual Question Answering

Humans explain inter-object relationships with semantic labels that demo...
research
05/03/2020

On the Relationships Between the Grammatical Genders of Inanimate Nouns and Their Co-Occurring Adjectives and Verbs

We use large-scale corpora in six different gendered languages, along wi...
research
07/31/2017

Scene Graph Generation from Objects, Phrases and Region Captions

Object detection, scene graph generation and region captioning, which ar...
research
10/27/2020

SIRI: Spatial Relation Induced Network For Spatial Description Resolution

Spatial Description Resolution, as a language-guided localization task, ...

Please sign up or login with your details

Forgot password? Click here to reset