Visual Translation Embedding Network for Visual Relation Detection

02/27/2017
by   Hanwang Zhang, et al.
0

Visual relations, such as "person ride bike" and "bike next to car", offer a comprehensive scene understanding of an image, and have already shown their great utility in connecting computer vision and natural language. However, due to the challenging combinatorial complexity of modeling subject-predicate-object relation triplets, very little work has been done to localize and predict visual relations. Inspired by the recent advances in relational representation learning of knowledge bases and convolutional object detection networks, we propose a Visual Translation Embedding network (VTransE) for visual relation detection. VTransE places objects in a low-dimensional relation space where a relation can be modeled as a simple vector translation, i.e., subject + predicate ≈ object. We propose a novel feature extraction layer that enables object-relation knowledge transfer in a fully-convolutional fashion that supports training and inference in a single forward/backward pass. To the best of our knowledge, VTransE is the first end-to-end relation detection network. We demonstrate the effectiveness of VTransE over other state-of-the-art methods on two large-scale datasets: Visual Relationship and Visual Genome. Note that even though VTransE is a purely visual model, it is still competitive to the Lu's multi-modal model with language priors.

READ FULL TEXT

page 3

page 5

page 6

page 7

page 8

page 10

research
05/28/2019

Union Visual Translation Embedding for Visual Relationship Detection and Scene Graph Generation

Relations amongst entities play a central role in image understanding. D...
research
07/16/2018

Object Relation Detection Based on One-shot Learning

Detecting the relations among objects, such as "cat on sofa" and "person...
research
05/02/2019

Improving Visual Relation Detection using Depth Maps

State of the art visual relation detection methods have been relying on ...
research
04/27/2018

Large-Scale Visual Relationship Understanding

Large scale visual understanding is challenging, as it requires a model ...
research
12/13/2018

Detecting rare visual relations using analogies

We seek to detect visual relations in images of the form of triplets t =...
research
01/20/2021

Video Relation Detection with Trajectory-aware Multi-modal Features

Video relation detection problem refers to the detection of the relation...
research
08/26/2021

Few-shot Visual Relationship Co-localization

In this paper, given a small bag of images, each containing a common but...

Please sign up or login with your details

Forgot password? Click here to reset