Context-Dependent Diffusion Network for Visual Relationship Detection

09/11/2018
by   Zhen Cui, et al.
0

Visual relationship detection can bridge the gap between computer vision and natural language for scene understanding of images. Different from pure object recognition tasks, the relation triplets of subject-predicate-object lie on an extreme diversity space, such as person-behind-person and car-behind-building, while suffering from the problem of combinatorial explosion. In this paper, we propose a context-dependent diffusion network (CDDN) framework to deal with visual relationship detection. To capture the interactions of different object instances, two types of graphs, word semantic graph and visual scene graph, are constructed to encode global context interdependency. The semantic graph is built through language priors to model semantic correlations across objects, whilst the visual scene graph defines the connections of scene objects so as to utilize the surrounding scene information. For the graph-structured data, we design a diffusion network to adaptively aggregate information from contexts, which can effectively learn latent representations of visual relationships and well cater to visual relationship detection in view of its isomorphic invariance to graphs. Experiments on two widely-used datasets demonstrate that our proposed method is more effective and achieves the state-of-the-art performance.

READ FULL TEXT
research
03/08/2017

Deep Variation-structured Reinforcement Learning for Visual Relationship and Attribute Detection

Despite progress in visual perception tasks such as image classification...
research
11/16/2017

Natural Language Guided Visual Relationship Detection

Reasoning about the relationships between object pairs in images is a cr...
research
10/27/2019

Leveraging Auxiliary Text for Deep Recognition of Unseen Visual Relationships

One of the most difficult tasks in scene understanding is recognizing in...
research
09/16/2013

Visual-Semantic Scene Understanding by Sharing Labels in a Context Network

We consider the problem of naming objects in complex, natural scenes con...
research
07/31/2017

Scene Graph Generation from Objects, Phrases and Region Captions

Object detection, scene graph generation and region captioning, which ar...
research
09/01/2018

Improving Visual Relationship Detection using Semantic Modeling of Scene Descriptions

Structured scene descriptions of images are useful for the automatic pro...
research
12/05/2018

Learning to Compose Dynamic Tree Structures for Visual Contexts

We propose to compose dynamic tree structures that place the objects in ...

Please sign up or login with your details

Forgot password? Click here to reset