ViP-CNN: Visual Phrase Guided Convolutional Neural Network

02/23/2017
by   Yikang Li, et al.
0

As the intermediate level task connecting image captioning and object detection, visual relationship detection started to catch researchers' attention because of its descriptive power and clear structure. It detects the objects and captures their pair-wise interactions with a subject-predicate-object triplet, e.g. person-ride-horse. In this paper, each visual relationship is considered as a phrase with three components. We formulate the visual relationship detection as three inter-connected recognition problems and propose a Visual Phrase guided Convolutional Neural Network (ViP-CNN) to address them simultaneously. In ViP-CNN, we present a Phrase-guided Message Passing Structure (PMPS) to establish the connection among relationship components and help the model consider the three problems jointly. Corresponding non-maximum suppression method and model training strategy are also proposed. Experimental results show that our ViP-CNN outperforms the state-of-art method both in speed and accuracy. We further pretrain ViP-CNN on our cleansed Visual Genome Relationship dataset, which is found to perform better than the pretraining on the ImageNet for this task.

READ FULL TEXT

page 1

page 3

page 4

research
11/16/2017

Natural Language Guided Visual Relationship Detection

Reasoning about the relationships between object pairs in images is a cr...
research
04/16/2019

Visual Relationship Detection with Language prior and Softmax

Visual relationship detection is an intermediate image understanding tas...
research
11/17/2016

SCA-CNN: Spatial and Channel-wise Attention in Convolutional Networks for Image Captioning

Visual attention has been successfully applied in structural prediction ...
research
11/27/2017

Query-Adaptive R-CNN for Open-Vocabulary Object Detection and Retrieval

We address the problem of open-vocabulary object retrieval and localizat...
research
09/01/2018

Improving Visual Relationship Detection using Semantic Modeling of Scene Descriptions

Structured scene descriptions of images are useful for the automatic pro...
research
05/28/2018

Visual Relationship Detection Based on Guided Proposals and Semantic Knowledge Distillation

A thorough comprehension of image content demands a complex grasp of the...
research
03/26/2019

Optimising the Input Image to Improve Visual Relationship Detection

Visual Relationship Detection is defined as, given an image composed of ...

Please sign up or login with your details

Forgot password? Click here to reset