Identifying Spatial Relations in Images using Convolutional Neural Networks

06/13/2017
by   Mandar Haldekar, et al.
0

Traditional approaches to building a large scale knowledge graph have usually relied on extracting information (entities, their properties, and relations between them) from unstructured text (e.g. Dbpedia). Recent advances in Convolutional Neural Networks (CNN) allow us to shift our focus to learning entities and relations from images, as they build robust models that require little or no pre-processing of the images. In this paper, we present an approach to identify and extract spatial relations (e.g., The girl is standing behind the table) from images using CNNs. Our research addresses two specific challenges: providing insight into how spatial relations are learned by the network and which parts of the image are used to predict these relations. We use the pre-trained network VGGNet to extract features from an image and train a Multi-layer Perceptron (MLP) on a set of synthetic images and the sun09 dataset to extract spatial relations. The MLP predicts spatial relations without a bounding box around the objects or the space in the image depicting the relation. To understand how the spatial relations are represented in the network, a heatmap is overlayed on the image to show the regions that are deemed important by the network. Also, we analyze the MLP to show the relationship between the activation of consistent groups of nodes and the prediction of a spatial relation. We show how the loss of these groups affects the networks ability to identify relations.

READ FULL TEXT

page 2

page 3

page 5

page 7

page 8

research
07/23/2017

A comment on the paper Prediction of Kidney Function from Biopsy Images using Convolutional Neural Networks

This letter presente a comment on the paper Prediction of Kidney Functio...
research
03/08/2016

Extracting Arabic Relations from the Web

The goal of this research is to extract a large list or table from named...
research
09/14/2018

Enhanced Optic Disk and Cup Segmentation with Glaucoma Screening from Fundus Images using Position encoded CNNs

In this manuscript, we present a robust method for glaucoma screening fr...
research
10/05/2015

RAID: A Relation-Augmented Image Descriptor

As humans, we regularly interpret images based on the relations between ...
research
02/01/2021

Inferring spatial relations from textual descriptions of images

Generating an image from its textual description requires both a certain...
research
06/05/2022

U(1) Symmetry-breaking Observed in Generic CNN Bottleneck Layers

We report on a significant discovery linking deep convolutional neural n...

Please sign up or login with your details

Forgot password? Click here to reset