Rethinking Visual Relationships for High-level Image Understanding

02/01/2019
by   Yuanzhi Liang, et al.
18

Relationships, as the bond of isolated entities in images, reflect the interaction between objects and lead to a semantic understanding of scenes. Suffering from visually-irrelevant relationships in current scene graph datasets, the utilization of relationships for semantic tasks is difficult. The datasets widely used in scene graph generation tasks are splitted from Visual Genome by label frequency, which even can be well solved by statistical counting. To encourage further development in relationships, we propose a novel method to mine more valuable relationships by automatically filtering out visually-irrelevant relationships. Then, we construct a new scene graph dataset named Visually-Relevant Relationships Dataset (VrR-VG) from Visual Genome. We evaluate several existing methods in scene graph generation in our dataset. The results show the performances degrade significantly compared to the previous dataset and the frequency analysis do not work on our dataset anymore. Moreover, we propose a method to learn feature representations of instances, attributes, and visual relationships jointly from images, then we apply the learned features to image captioning and visual question answering respectively. The improvements on the both tasks demonstrate the efficiency of the features with relation information and the richer semantic information provided in our dataset.

READ FULL TEXT

page 1

page 2

page 6

page 11

page 12

research
09/23/2021

Scene Graph Generation for Better Image Captioning?

We investigate the incorporation of visual relationships into the task o...
research
11/25/2021

Scene Graph Generation with Geometric Context

Scene Graph Generation has gained much attention in computer vision rese...
research
08/19/2021

Semantic Compositional Learning for Low-shot Scene Graph Generation

Scene graphs provide valuable information to many downstream tasks. Many...
research
02/23/2016

Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations

Despite progress in perceptual tasks such as image classification, compu...
research
11/26/2018

Attentive Relational Networks for Mapping Images to Scene Graphs

Scene graph generation refers to the task of automatically mapping an im...
research
08/27/2018

Improving Information Extraction from Images with Learned Semantic Models

Many applications require an understanding of an image that goes beyond ...
research
04/25/2019

Scene Graph Prediction with Limited Labels

Visual knowledge bases such as Visual Genome power numerous applications...

Please sign up or login with your details

Forgot password? Click here to reset