Union Visual Translation Embedding for Visual Relationship Detection and Scene Graph Generation

05/28/2019
by   Zih-Siou Hung, et al.
0

Relations amongst entities play a central role in image understanding. Due to the combinatorial complexity of modeling (subject, predicate, object) relation triplets, it is crucial to develop a method that can not only recognize seen relations, but also generalize well to unseen cases. Inspired by Visual Translation Embedding network (VTransE), we propose the Union Visual Translation Embedding network (UVTransE) to capture both common and rare relations with better accuracy. UVTransE maps the subject, the object, and the union (subject, object) image regions into a low-dimensional relation space where a predicate can be expressed as a vector subtraction, such that predicate ≈ union (subject, object) - subject - object. We present a comprehensive evaluation of our method on multiple challenging benchmarks: the Visual Relationship Detection dataset (VRD); UnRel dataset for rare and unusual relations; two subsets of Visual Genome; and the Open Images Challenge. Our approach decisively outperforms VTransE and comes close to or exceeds the state of the art across a range of settings, from small-scale to large-scale datasets, from common to previously unseen relations. On Visual Genome and Open Images, it also achieves promising results on the recently introduced task of scene graph generation.

READ FULL TEXT

page 2

page 7

page 8

page 10

page 11

research
02/27/2017

Visual Translation Embedding Network for Visual Relation Detection

Visual relations, such as "person ride bike" and "bike next to car", off...
research
12/13/2018

Detecting rare visual relations using analogies

We seek to detect visual relations in images of the form of triplets t =...
research
04/27/2018

Large-Scale Visual Relationship Understanding

Large scale visual understanding is challenging, as it requires a model ...
research
02/01/2020

Unbiased Scene Graph Generation via Rich and Fair Semantic Extraction

Extracting graph representation of visual scenes in image is a challengi...
research
03/26/2019

Optimising the Input Image to Improve Visual Relationship Detection

Visual Relationship Detection is defined as, given an image composed of ...
research
08/17/2022

Towards Open-vocabulary Scene Graph Generation with Prompt-based Finetuning

Scene graph generation (SGG) is a fundamental task aimed at detecting vi...
research
08/26/2021

Few-shot Visual Relationship Co-localization

In this paper, given a small bag of images, each containing a common but...

Please sign up or login with your details

Forgot password? Click here to reset