Improving Visual Relationship Detection using Semantic Modeling of Scene Descriptions

09/01/2018
by   Stephan Baier, et al.
0

Structured scene descriptions of images are useful for the automatic processing and querying of large image databases. We show how the combination of a semantic and a visual statistical model can improve on the task of mapping images to their associated scene description. In this paper we consider scene descriptions which are represented as a set of triples (subject, predicate, object), where each triple consists of a pair of visual objects, which appear in the image, and the relationship between them (e.g. man-riding-elephant, man-wearing-hat). We combine a standard visual model for object detection, based on convolutional neural networks, with a latent variable model for link prediction. We apply multiple state-of-the-art link prediction methods and compare their capability for visual relationship detection. One of the main advantages of link prediction methods is that they can also generalize to triples, which have never been observed in the training data. Our experimental results on the recently published Stanford Visual Relationship dataset, a challenging real world dataset, show that the integration of a semantic model using link prediction methods can significantly improve the results for visual relationship detection. Our combined approach achieves superior performance compared to the state-of-the-art method from the Stanford computer vision group.

READ FULL TEXT
research
08/27/2018

Improving Information Extraction from Images with Learned Semantic Models

Many applications require an understanding of an image that goes beyond ...
research
06/18/2022

VReBERT: A Simple and Flexible Transformer for Visual Relationship Detection

Visual Relationship Detection (VRD) impels a computer vision model to 's...
research
04/22/2020

Semantic Entity Enrichment by Leveraging Multilingual Descriptions for Link Prediction

Most Knowledge Graphs (KGs) contain textual descriptions of entities in ...
research
09/11/2018

Context-Dependent Diffusion Network for Visual Relationship Detection

Visual relationship detection can bridge the gap between computer vision...
research
12/13/2019

Fast Computation of Katz Index for Efficient Processing of Link Prediction Queries

Network proximity computations are among the most common operations in v...
research
02/23/2017

ViP-CNN: Visual Phrase Guided Convolutional Neural Network

As the intermediate level task connecting image captioning and object de...
research
12/12/2019

Learning Effective Visual Relationship Detector on 1 GPU

We present our winning solution to the Open Images 2019 Visual Relations...

Please sign up or login with your details

Forgot password? Click here to reset