Optimising the Input Image to Improve Visual Relationship Detection

03/26/2019
by   Noel Mizzi, et al.
10

Visual Relationship Detection is defined as, given an image composed of a subject and an object, the correct relation is predicted. To improve the visual part of this difficult problem, ten preprocessing methods were tested to determine whether the widely used Union method yields the optimal results. Therefore, focusing solely on predicate prediction, no object detection and linguistic knowledge were used to prevent them from affecting the comparison results. Once fine-tuned, the Visual Geometry Group models were evaluated using Recall@1, per-predicate recall, activation maximisations, class activation maps, and error analysis. From this research it was found that using preprocessing methods such as the Union-Without-Background-and-with-Binary-mask (Union-WB-and-B) method yields significantly better results than the widely used Union method since, as designed, it enables the Convolutional Neural Network to also identify the subject and object in the convolutional layers instead of solely in the fully-connected layers.

READ FULL TEXT

page 1

page 3

page 5

page 6

research
05/28/2019

Union Visual Translation Embedding for Visual Relationship Detection and Scene Graph Generation

Relations amongst entities play a central role in image understanding. D...
research
05/29/2020

Fixed-size Objects Encoding for Visual Relationship Detection

In this paper, we propose a fixed-size object encoding method (FOE-VRD) ...
research
10/15/2015

DeepProposal: Hunting Objects by Cascading Deep Convolutional Layers

In this paper we evaluate the quality of the activation layers of a conv...
research
08/16/2021

Intent-based Object Grasping by a Robot using Deep Learning

A robot needs to predict an ideal rectangle for optimal object grasping ...
research
02/23/2017

ViP-CNN: Visual Phrase Guided Convolutional Neural Network

As the intermediate level task connecting image captioning and object de...
research
12/01/2018

Classifying a specific image region using convolutional nets with an ROI mask as input

Convolutional neural nets (CNN) are the leading computer vision method f...
research
12/08/2016

Joint Hand Detection and Rotation Estimation by Using CNN

Hand detection is essential for many hand related tasks, e.g. parsing ha...

Please sign up or login with your details

Forgot password? Click here to reset