Fixed-size Objects Encoding for Visual Relationship Detection

05/29/2020
by   Hengyue Pan, et al.
5

In this paper, we propose a fixed-size object encoding method (FOE-VRD) to improve performance of visual relationship detection tasks. Comparing with previous methods, FOE-VRD has an important feature, i.e., it uses one fixed-size vector to encoding all objects in each input image to assist the process of relationship detection. Firstly, we use a regular convolution neural network as a feature extractor to generate high-level features of input images. Then, for each relationship triplet in input images, i.e., <subject-predicate-object>, we apply ROI-pooling to get feature vectors of two regions on the feature maps that corresponding to bounding boxes of the subject and object. Besides the subject and object, our analysis implies that the results of predicate classification may also related to the rest objects in input images (we call them background objects). Due to the variable number of background objects in different images and computational costs, we cannot generate feature vectors for them one-by-one by using ROI pooling technique. Instead, we propose a novel method to encode all background objects in each image by using one fixed-size vector (i.e., FBE vector). By concatenating the 3 vectors we generate above, we successfully encode the objects using one fixed-size vector. The generated feature vector is then feed into a fully connected neural network to get predicate classification results. Experimental results on VRD database (entire set and zero-shot tests) show that the proposed method works well on both predicate classification and relationship detection.

READ FULL TEXT

page 3

page 6

research
06/18/2014

Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition

Existing deep convolutional neural networks (CNNs) require a fixed-size ...
research
09/24/2014

Do More Dropouts in Pool5 Feature Maps for Better Object Detection

Deep Convolutional Neural Networks (CNNs) have gained great success in i...
research
03/26/2019

Optimising the Input Image to Improve Visual Relationship Detection

Visual Relationship Detection is defined as, given an image composed of ...
research
06/15/2019

Deep Set Prediction Networks

We study the problem of predicting a set from a feature vector with a de...
research
11/06/2020

Disentangling 3D Prototypical Networks For Few-Shot Concept Learning

We present neural architectures that disentangle RGB-D images into objec...
research
06/16/2020

Explanation-based Weakly-supervised Learning of Visual Relations with Graph Networks

Visual relationship detection is fundamental for holistic image understa...
research
12/10/2019

FootAndBall: Integrated player and ball detector

The paper describes a deep neural network-based detector dedicated for b...

Please sign up or login with your details

Forgot password? Click here to reset