Visual Relationship Detection with Low Rank Non-Negative Tensor Decomposition

by   Mohammed Haroon Dupty, et al.

We address the problem of Visual Relationship Detection (VRD) which aims to describe the relationships between pairs of objects in the form of triplets of (subject, predicate, object). We observe that given a pair of bounding box proposals, objects often participate in multiple relations implying the distribution of triplets is multimodal. We leverage the strong correlations within triplets to learn the joint distribution of triplet variables conditioned on the image and the bounding box proposals, doing away with the hitherto used independent distribution of triplets. To make learning the triplet joint distribution feasible, we introduce a novel technique of learning conditional triplet distributions in the form of their normalized low rank non-negative tensor decompositions. Normalized tensor decompositions take form of mixture distributions of discrete variables and thus are able to capture multimodality. This allows us to efficiently learn higher order discrete multimodal distributions and at the same time keep the parameter size manageable. We further model the probability of selecting an object proposal pair and include a relation triplet prior in our model. We show that each part of the model improves performance and the combination outperforms state-of-the-art score on the Visual Genome (VG) and Visual Relationship Detection (VRD) datasets.


page 1

page 7


Tensor Composition Net for Visual Relationship Prediction

We present a novel Tensor Composition Network (TCN) to predict visual re...

CPARR: Category-based Proposal Analysis for Referring Relationships

The task of referring relationships is to localize subject and object en...

Weakly-supervised learning of visual relations

This paper introduces a novel approach for modeling visual relations bet...

Learning Mixtures of Discrete Product Distributions using Spectral Decompositions

We study the problem of learning a distribution from samples, when the u...

Visual Relationship Detection with Internal and External Linguistic Knowledge Distillation

Understanding visual relationships involves identifying the subject, the...

Joint Probability Estimation Using Tensor Decomposition and Dictionaries

In this work, we study non-parametric estimation of joint probabilities ...

Please sign up or login with your details

Forgot password? Click here to reset